{"id":111616,"date":"2025-10-01T06:59:17","date_gmt":"2025-10-01T06:59:17","guid":{"rendered":"https:\/\/www.dumpsbase.com\/freedumps\/?p=111616"},"modified":"2025-09-30T07:02:02","modified_gmt":"2025-09-30T07:02:02","slug":"complete-your-nvidia-certified-professional-ai-infrastructure-exam-with-ncp-aii-dumps-v8-02-continue-to-check-ncp-aii-free-dumps-part-3-q81-q120","status":"publish","type":"post","link":"https:\/\/www.dumpsbase.com\/freedumps\/complete-your-nvidia-certified-professional-ai-infrastructure-exam-with-ncp-aii-dumps-v8-02-continue-to-check-ncp-aii-free-dumps-part-3-q81-q120.html","title":{"rendered":"Complete Your NVIDIA Certified Professional AI Infrastructure Exam with NCP-AII Dumps (V8.02): Continue to Check NCP-AII Free Dumps (Part 3, Q81-Q120)"},"content":{"rendered":"<p>How to complete your NVIDIA Certified Professional AI Infrastructure (NCP-AII) certification exam quickly and smoothly? You can choose the NCP-AII dumps (V8.02) and study all the latest exam questions and answers now. With DumpsBase\u2019s NCP-AII exam dumps, passing your NVIDIA NCP-AII certification exam can be more seamless and more feasible than you ever envisioned. Before downloading, you can read our free dumps online:<\/p>\n<ul>\n<li><a href=\"https:\/\/www.dumpsbase.com\/freedumps\/new-ncp-aii-dumps-v8-02-become-the-preferred-choice-for-making-preparations-check-the-nvidia-ncp-aii-free-dumps-part-1-q1-q40.html\"><em>NCP-AII free dumps (Part 1, Q1-Q40) of V8.02<\/em><\/a><\/li>\n<li><a href=\"https:\/\/www.dumpsbase.com\/freedumps\/download-the-nvidia-ai-infrastructure-ncp-aii-dumps-v8-02-and-start-preparation-today-continue-to-read-ncp-aii-free-dumps-part-2-q41-q80.html\"><em>NCP-AII free dumps (Part 2, Q41-Q80) of V8.02<\/em><\/a><\/li>\n<\/ul>\n<p>From these demos, you can confirm that DumpsBase offers a beacon of hope with its diligently crafted NVIDIA NCP-AII practice test questions, which include verified answers. We guarantee that you can achieve success in the NVIDIA NCP-AII exam. To help you check more, we continue to share additional demos, which include 40 more free questions online.<\/p>\n<p><!-- notionvc: 0fc66ab3-efc7-4c3f-921b-1c228e864c9f --><\/p>\n<h2>Continue to check our <span style=\"background-color: #33cccc;\"><em>NCP-AII free dumps (Part 3, Q81-Q120) of V8.02<\/em><\/span> online:<\/h2>\n<script>\n\t  window.fbAsyncInit = function() {\n\t    FB.init({\n\t      appId            : '622169541470367',\n\t      autoLogAppEvents : true,\n\t      xfbml            : true,\n\t      version          : 'v3.1'\n\t    });\n\t  };\n\t\n\t  (function(d, s, id){\n\t     var js, fjs = d.getElementsByTagName(s)[0];\n\t     if (d.getElementById(id)) {return;}\n\t     js = d.createElement(s); js.id = id;\n\t     js.src = \"https:\/\/connect.facebook.net\/en_US\/sdk.js\";\n\t     fjs.parentNode.insertBefore(js, fjs);\n\t   }(document, 'script', 'facebook-jssdk'));\n\t<\/script><script type=\"text\/javascript\" >\ndocument.addEventListener(\"DOMContentLoaded\", function(event) { \nif(!window.jQuery) alert(\"The important jQuery library is not properly loaded in your site. Your WordPress theme is probably missing the essential wp_head() call. You can switch to another theme and you will see that the plugin works fine and this notice disappears. If you are still not sure what to do you can contact us for help.\");\n});\n<\/script>  \n  \n<div  id=\"watupro_quiz\" class=\"quiz-area single-page-quiz\">\n<p id=\"submittingExam10795\" style=\"display:none;text-align:center;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/plugins\/watupro\/img\/loading.gif\" width=\"16\" height=\"16\"><\/p>\n\n<div class=\"watupro-exam-description\" id=\"description-quiz-10795\"><\/div>\n\n<form action=\"\" method=\"post\" class=\"quiz-form\" id=\"quiz-10795\"  enctype=\"multipart\/form-data\" >\n<div class='watu-question ' id='question-1' style=';'><div id='questionWrap-1'  class='   watupro-question-id-426189'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>1. <\/span>You are configuring a network bridge on a Linux host that will connect multiple physical network interfaces to a virtual machine. You need to ensure that the virtual machine receives an IP address via DHCP. <br \/>\r<br>Which of the following is the correct command sequence to create the bridge interface \u2018br0\u2019, add physical interfaces \u2018eth0\u2019 and \u2018eth1\u2019 to it, and bring up the bridge interface? Assume the required packages are installed. Consider using \u2018ip\u2019 command. <br \/>\r<br>A ) <br \/>\r<br><br><img decoding=\"async\" width=649 height=18 id=\"\u56fe\u7247 32\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/uploads\/2025\/09\/image002.jpg\"><br><br \/>\r<br>B ) <br \/>\r<br><br><img decoding=\"async\" width=649 height=9 id=\"\u56fe\u7247 31\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/uploads\/2025\/09\/image003.jpg\"><br><br \/>\r<br>C ) <br \/>\r<br><br><img decoding=\"async\" width=649 height=13 id=\"\u56fe\u7247 30\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/uploads\/2025\/09\/image004.jpg\"><br><br \/>\r<br>D ) <br \/>\r<br><br><img decoding=\"async\" width=649 height=7 id=\"\u56fe\u7247 29\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/uploads\/2025\/09\/image005.jpg\"><br><br \/>\r<br>E ) <br \/>\r<br><br><img decoding=\"async\" width=650 height=12 id=\"\u56fe\u7247 28\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/uploads\/2025\/09\/image006.jpg\"><br><\/div><input type='hidden' name='question_id[]' id='qID_1' value='426189' \/><input type='hidden' id='answerType426189' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426189[]' id='answer-id-1650111' class='answer   answerof-426189 ' value='1650111'   \/><label for='answer-id-1650111' id='answer-label-1650111' class=' answer'><span>Option A<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426189[]' id='answer-id-1650112' class='answer   answerof-426189 ' value='1650112'   \/><label for='answer-id-1650112' id='answer-label-1650112' class=' answer'><span>Option B<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426189[]' id='answer-id-1650113' class='answer   answerof-426189 ' value='1650113'   \/><label for='answer-id-1650113' id='answer-label-1650113' class=' answer'><span>Option C<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426189[]' id='answer-id-1650114' class='answer   answerof-426189 ' value='1650114'   \/><label for='answer-id-1650114' id='answer-label-1650114' class=' answer'><span>Option D<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426189[]' id='answer-id-1650115' class='answer   answerof-426189 ' value='1650115'   \/><label for='answer-id-1650115' id='answer-label-1650115' class=' answer'><span>Option E<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-2' style=';'><div id='questionWrap-2'  class='   watupro-question-id-426190'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>2. <\/span>You are using GPU Direct RDMA to enable fast data transfer between GPUs across multiple servers. You are experiencing performance degradation and suspect RDMA is not working correctly. <br \/>\r<br>How can you verify that GPU Direct RDMA is properly enabled and functioning?<\/div><input type='hidden' name='question_id[]' id='qID_2' value='426190' \/><input type='hidden' id='answerType426190' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426190[]' id='answer-id-1650116' class='answer   answerof-426190 ' value='1650116'   \/><label for='answer-id-1650116' id='answer-label-1650116' class=' answer'><span>Check the output of \u2018nvidia-smi topo -m\u2019 to ensure that the GPUs are connected via NVLink and have RDMA enabled.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426190[]' id='answer-id-1650117' class='answer   answerof-426190 ' value='1650117'   \/><label for='answer-id-1650117' id='answer-label-1650117' class=' answer'><span>Examine the \u2018cimesg\u2019 output for any errors related to RDMA or InfiniBand drivers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426190[]' id='answer-id-1650118' class='answer   answerof-426190 ' value='1650118'   \/><label for='answer-id-1650118' id='answer-label-1650118' class=' answer'><span>Use the \u2018ibstat command to verify that the InfiniBand interfaces are active and connected.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426190[]' id='answer-id-1650119' class='answer   answerof-426190 ' value='1650119'   \/><label for='answer-id-1650119' id='answer-label-1650119' class=' answer'><span>Run a bandwidth benchmark using a tool like or to measure the RDMA throughput.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426190[]' id='answer-id-1650120' class='answer   answerof-426190 ' value='1650120'   \/><label for='answer-id-1650120' id='answer-label-1650120' class=' answer'><span>Ping the other servers to ensure network connectivity.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-3' style=';'><div id='questionWrap-3'  class='   watupro-question-id-426191'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>3. <\/span>You are deploying a new A1 inference service using Triton Inference Server on a multi-GPU system. After deploying the models, you observe that only one GPU is being utilized, even though the models are configured to use multiple GPUs. <br \/>\r<br>What could be the possible causes for this?<\/div><input type='hidden' name='question_id[]' id='qID_3' value='426191' \/><input type='hidden' id='answerType426191' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426191[]' id='answer-id-1650121' class='answer   answerof-426191 ' value='1650121'   \/><label for='answer-id-1650121' id='answer-label-1650121' class=' answer'><span>The model configuration file does not specify the \u2018instance_group\u2019 parameter correctly to utilize multiple GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426191[]' id='answer-id-1650122' class='answer   answerof-426191 ' value='1650122'   \/><label for='answer-id-1650122' id='answer-label-1650122' class=' answer'><span>The Triton Inference Server is not configured to enable CUDA Multi-Process Service (MPS).<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426191[]' id='answer-id-1650123' class='answer   answerof-426191 ' value='1650123'   \/><label for='answer-id-1650123' id='answer-label-1650123' class=' answer'><span>Insufficient CPU cores are available for the Triton Inference Server, limiting its ability to spawn multiple inference processes.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426191[]' id='answer-id-1650124' class='answer   answerof-426191 ' value='1650124'   \/><label for='answer-id-1650124' id='answer-label-1650124' class=' answer'><span>The models are not optimized for multi-GPU inference, resulting in a single GPU bottleneck.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426191[]' id='answer-id-1650125' class='answer   answerof-426191 ' value='1650125'   \/><label for='answer-id-1650125' id='answer-label-1650125' class=' answer'><span>The GPUs are not of the same type and Triton cannot properly schedule across them.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-4' style=';'><div id='questionWrap-4'  class='   watupro-question-id-426192'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>4. <\/span>You are running a large-scale distributed training job on a cluster of AMD EPYC servers, each equipped with multiple NVIDIAA100 GPUs. You are using Slurm for job scheduling. The training process often fails with NCCL errors related to network connectivity. <br \/>\r<br>What steps can you take to improve the reliability of the network communication for NCCL in this environment? Choose the MOST appropriate answers.<\/div><input type='hidden' name='question_id[]' id='qID_4' value='426192' \/><input type='hidden' id='answerType426192' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426192[]' id='answer-id-1650126' class='answer   answerof-426192 ' value='1650126'   \/><label for='answer-id-1650126' id='answer-label-1650126' class=' answer'><span>Ensure that the InfiniBand or RoCE network is properly configured and that all servers can communicate with each other over the network. Verify the network interface names and IP addresses in the NCCL configuration.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426192[]' id='answer-id-1650127' class='answer   answerof-426192 ' value='1650127'   \/><label for='answer-id-1650127' id='answer-label-1650127' class=' answer'><span>Use the Slurm \u2018srun\u2019 command with the \u2018\u2015mpi=pmi2 option to launch the training job. This ensures that Slurm properly initializes the MPl environment and sets the NCCL environment variables.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426192[]' id='answer-id-1650128' class='answer   answerof-426192 ' value='1650128'   \/><label for='answer-id-1650128' id='answer-label-1650128' class=' answer'><span>Increase the \u2018NCCL CONNECT TIMEOUT and *NCCL TIMEOUT environment variables to allow for longer network delays.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426192[]' id='answer-id-1650129' class='answer   answerof-426192 ' value='1650129'   \/><label for='answer-id-1650129' id='answer-label-1650129' class=' answer'><span>Disable the firewall on all servers to allow unrestricted network communication.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426192[]' id='answer-id-1650130' class='answer   answerof-426192 ' value='1650130'   \/><label for='answer-id-1650130' id='answer-label-1650130' class=' answer'><span>Decrease the batch size to reduce the amount of data transferred over the network.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-5' style=';'><div id='questionWrap-5'  class='   watupro-question-id-426193'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>5. <\/span>You\u2019re designing a data center network for inference workloads. The primary requirement is high availability. <br \/>\r<br>Which of the following considerations are MOST important for your topology design?<\/div><input type='hidden' name='question_id[]' id='qID_5' value='426193' \/><input type='hidden' id='answerType426193' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426193[]' id='answer-id-1650131' class='answer   answerof-426193 ' value='1650131'   \/><label for='answer-id-1650131' id='answer-label-1650131' class=' answer'><span>Minimizing hop count<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426193[]' id='answer-id-1650132' class='answer   answerof-426193 ' value='1650132'   \/><label for='answer-id-1650132' id='answer-label-1650132' class=' answer'><span>Implementing redundant paths<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426193[]' id='answer-id-1650133' class='answer   answerof-426193 ' value='1650133'   \/><label for='answer-id-1650133' id='answer-label-1650133' class=' answer'><span>Using the cheapest possible switches<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426193[]' id='answer-id-1650134' class='answer   answerof-426193 ' value='1650134'   \/><label for='answer-id-1650134' id='answer-label-1650134' class=' answer'><span>Prioritizing north-south bandwidth over east-west bandwidth<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426193[]' id='answer-id-1650135' class='answer   answerof-426193 ' value='1650135'   \/><label for='answer-id-1650135' id='answer-label-1650135' class=' answer'><span>Centralized routing<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-6' style=';'><div id='questionWrap-6'  class='   watupro-question-id-426194'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>6. <\/span>A server with eight NVIDIAAIOO GPUs experiences frequent CUDA errors during large model training. \u2018nvidia-smi\u2019 reports seemingly normal temperatures for all GPUs. However, upon closer inspection using IPMI, the inlet temperature for GPUs 3 and 4 is significantly higher than others. <br \/>\r<br>What is the MOST likely cause and the immediate action to take?<\/div><input type='hidden' name='question_id[]' id='qID_6' value='426194' \/><input type='hidden' id='answerType426194' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426194[]' id='answer-id-1650136' class='answer   answerof-426194 ' value='1650136'   \/><label for='answer-id-1650136' id='answer-label-1650136' class=' answer'><span>A driver issue is causing incorrect temperature reporting; reinstall the NVIDIA driver.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426194[]' id='answer-id-1650137' class='answer   answerof-426194 ' value='1650137'   \/><label for='answer-id-1650137' id='answer-label-1650137' class=' answer'><span>The temperature sensors on GPUs 3 and 4 are faulty; replace the GPUs immediately.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426194[]' id='answer-id-1650138' class='answer   answerof-426194 ' value='1650138'   \/><label for='answer-id-1650138' id='answer-label-1650138' class=' answer'><span>There is a localized airflow problem affecting GPUs 3 and 4; check fan speeds and airflow obstructions.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426194[]' id='answer-id-1650139' class='answer   answerof-426194 ' value='1650139'   \/><label for='answer-id-1650139' id='answer-label-1650139' class=' answer'><span>The power supply is failing to provide sufficient power to GPUs 3 and 4; replace the power supply.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426194[]' id='answer-id-1650140' class='answer   answerof-426194 ' value='1650140'   \/><label for='answer-id-1650140' id='answer-label-1650140' class=' answer'><span>A software bug in the CUDA toolkit is causing the errors; downgrade to an earlier version.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-7' style=';'><div id='questionWrap-7'  class='   watupro-question-id-426195'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>7. <\/span>A data scientist reports slow data loading times when training a large language model. The data is stored in a Ceph cluster. You suspect the client-side caching is not properly configured. <br \/>\r<br>Which Ceph configuration parameter(s) should you investigate and potentially adjust to improve data loading performance? Select all that apply.<\/div><input type='hidden' name='question_id[]' id='qID_7' value='426195' \/><input type='hidden' id='answerType426195' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426195[]' id='answer-id-1650141' class='answer   answerof-426195 ' value='1650141'   \/><label for='answer-id-1650141' id='answer-label-1650141' class=' answer'><span>client cache size<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426195[]' id='answer-id-1650142' class='answer   answerof-426195 ' value='1650142'   \/><label for='answer-id-1650142' id='answer-label-1650142' class=' answer'><span>client quota<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426195[]' id='answer-id-1650143' class='answer   answerof-426195 ' value='1650143'   \/><label for='answer-id-1650143' id='answer-label-1650143' class=' answer'><span>mds cache size<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426195[]' id='answer-id-1650144' class='answer   answerof-426195 ' value='1650144'   \/><label for='answer-id-1650144' id='answer-label-1650144' class=' answer'><span>fuse_client_max_background<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-8' style=';'><div id='questionWrap-8'  class='   watupro-question-id-426196'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>8. <\/span>A data center is designed for A1 training with a high degree of east-west traffic. Considering cost and performance, which network topology is generally the most suitable?<\/div><input type='hidden' name='question_id[]' id='qID_8' value='426196' \/><input type='hidden' id='answerType426196' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426196[]' id='answer-id-1650145' class='answer   answerof-426196 ' value='1650145'   \/><label for='answer-id-1650145' id='answer-label-1650145' class=' answer'><span>Spine-Leaf<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426196[]' id='answer-id-1650146' class='answer   answerof-426196 ' value='1650146'   \/><label for='answer-id-1650146' id='answer-label-1650146' class=' answer'><span>Three-Tier<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426196[]' id='answer-id-1650147' class='answer   answerof-426196 ' value='1650147'   \/><label for='answer-id-1650147' id='answer-label-1650147' class=' answer'><span>Ring<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426196[]' id='answer-id-1650148' class='answer   answerof-426196 ' value='1650148'   \/><label for='answer-id-1650148' id='answer-label-1650148' class=' answer'><span>Bus<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426196[]' id='answer-id-1650149' class='answer   answerof-426196 ' value='1650149'   \/><label for='answer-id-1650149' id='answer-label-1650149' class=' answer'><span>Mesh<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-9' style=';'><div id='questionWrap-9'  class='   watupro-question-id-426197'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>9. <\/span>Your deep learning training job that utilizes NCCL (NVIDIA Collective Communications Library) for multi-GPU communication is failing with &quot;NCCL internal error, unhandled system error&quot; after a recent CUDA update. The error occurs during the \u2018all reduce\u2019 operation. <br \/>\r<br>What is the most likely root cause and how would you address it?<\/div><input type='hidden' name='question_id[]' id='qID_9' value='426197' \/><input type='hidden' id='answerType426197' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426197[]' id='answer-id-1650150' class='answer   answerof-426197 ' value='1650150'   \/><label for='answer-id-1650150' id='answer-label-1650150' class=' answer'><span>Incompatible NCCL version with the new CUDA version. Update NCCL to a version compatible with the installed CUDA version.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426197[]' id='answer-id-1650151' class='answer   answerof-426197 ' value='1650151'   \/><label for='answer-id-1650151' id='answer-label-1650151' class=' answer'><span>Insufficient shared memory allocated to the CUDA context. Increase the shared memory limit using \u2018cudaDeviceSetLimit(cudaLimitSharedMemory, new_limity.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426197[]' id='answer-id-1650152' class='answer   answerof-426197 ' value='1650152'   \/><label for='answer-id-1650152' id='answer-label-1650152' class=' answer'><span>Firewall rules blocking inter-GPU communication. Configure the firewall to allow communication on the NCCL-defined ports (typically 8000-8010).<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426197[]' id='answer-id-1650153' class='answer   answerof-426197 ' value='1650153'   \/><label for='answer-id-1650153' id='answer-label-1650153' class=' answer'><span>Faulty network cables used for inter-node communication (if the training job spans multiple servers). Replace the network cables with certified high-speed cables.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426197[]' id='answer-id-1650154' class='answer   answerof-426197 ' value='1650154'   \/><label for='answer-id-1650154' id='answer-label-1650154' class=' answer'><span>GPU Direct RDMA is not properly configured. Check \u2018dmesg\u2019 for errors and ensure RDMA is enabled.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-10' style=';'><div id='questionWrap-10'  class='   watupro-question-id-426198'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>10. <\/span>You are tasked with diagnosing performance issues on a GPU server running a large-scale HPC simulation. The simulation utilizes multiple GPUs and InfiniBand for inter-GPU communication. You suspect that RDMA (Remote Direct Memory Access) is not functioning correctly. <br \/>\r<br>How would you comprehensively test and verify the proper operation of RDMA between the GPUs?<\/div><input type='hidden' name='question_id[]' id='qID_10' value='426198' \/><input type='hidden' id='answerType426198' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426198[]' id='answer-id-1650155' class='answer   answerof-426198 ' value='1650155'   \/><label for='answer-id-1650155' id='answer-label-1650155' class=' answer'><span>Use \u2018ping\u2019 to verify basic network connectivity between the server\u2019s InfiniBand interfaces.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426198[]' id='answer-id-1650156' class='answer   answerof-426198 ' value='1650156'   \/><label for='answer-id-1650156' id='answer-label-1650156' class=' answer'><span>Employ and from the \u2018perftest\u2019 suite to measure RDMA bandwidth and latency between GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426198[]' id='answer-id-1650157' class='answer   answerof-426198 ' value='1650157'   \/><label for='answer-id-1650157' id='answer-label-1650157' class=' answer'><span>Run \u2018nvidia-smi topo -m\u2019 to check the GPU interconnect topology and verify that NVLink or PCle is being used for communication.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426198[]' id='answer-id-1650158' class='answer   answerof-426198 ' value='1650158'   \/><label for='answer-id-1650158' id='answer-label-1650158' class=' answer'><span>Utilize NCCL\u2019s internal diagnostic tools to verify proper inter-GPU communication within the simulation.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426198[]' id='answer-id-1650159' class='answer   answerof-426198 ' value='1650159'   \/><label for='answer-id-1650159' id='answer-label-1650159' class=' answer'><span>Monitor CPU utilization during the simulation; high CPU usage suggests that RDMA is not offloading communication effectively.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-11' style=';'><div id='questionWrap-11'  class='   watupro-question-id-426199'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>11. <\/span>In a distributed training environment with NVLink switches, you need to optimize the data transfer between GPUs on different servers. <br \/>\r<br>Which strategy is most likely to minimize the impact of inter-server latency on the overall training time?<\/div><input type='hidden' name='question_id[]' id='qID_11' value='426199' \/><input type='hidden' id='answerType426199' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426199[]' id='answer-id-1650160' class='answer   answerof-426199 ' value='1650160'   \/><label for='answer-id-1650160' id='answer-label-1650160' class=' answer'><span>Increasing the batch size to amortize the cost of data transfers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426199[]' id='answer-id-1650161' class='answer   answerof-426199 ' value='1650161'   \/><label for='answer-id-1650161' id='answer-label-1650161' class=' answer'><span>Using asynchronous data transfers with overlapping computation.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426199[]' id='answer-id-1650162' class='answer   answerof-426199 ' value='1650162'   \/><label for='answer-id-1650162' id='answer-label-1650162' class=' answer'><span>Compressing the data before transferring it over the network.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426199[]' id='answer-id-1650163' class='answer   answerof-426199 ' value='1650163'   \/><label for='answer-id-1650163' id='answer-label-1650163' class=' answer'><span>Using a centralized parameter server architecture.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426199[]' id='answer-id-1650164' class='answer   answerof-426199 ' value='1650164'   \/><label for='answer-id-1650164' id='answer-label-1650164' class=' answer'><span>Switching to a synchronous SGD (Stochastic Gradient Descent) algorithm.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-12' style=';'><div id='questionWrap-12'  class='   watupro-question-id-426200'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>12. <\/span>You need to remotely monitor the GPU temperature and utilization of a server without installing any additional software on the server itself. <br \/>\r<br>Assuming you have network access to the server\u2019s BMC (Baseboard Management Controller), which protocol and standard data format would BEST facilitate this?<\/div><input type='hidden' name='question_id[]' id='qID_12' value='426200' \/><input type='hidden' id='answerType426200' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426200[]' id='answer-id-1650165' class='answer   answerof-426200 ' value='1650165'   \/><label for='answer-id-1650165' id='answer-label-1650165' class=' answer'><span>SNMP (Simple Network Management Protocol) with MIB (Management Information Base)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426200[]' id='answer-id-1650166' class='answer   answerof-426200 ' value='1650166'   \/><label for='answer-id-1650166' id='answer-label-1650166' class=' answer'><span>HTTP with JSON<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426200[]' id='answer-id-1650167' class='answer   answerof-426200 ' value='1650167'   \/><label for='answer-id-1650167' id='answer-label-1650167' class=' answer'><span>SSH with plain text output from \u2018nvidia-smi\u2019<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426200[]' id='answer-id-1650168' class='answer   answerof-426200 ' value='1650168'   \/><label for='answer-id-1650168' id='answer-label-1650168' class=' answer'><span>IPMI (Intelligent Platform Management Interface) with SDR (Sensor Data Records)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426200[]' id='answer-id-1650169' class='answer   answerof-426200 ' value='1650169'   \/><label for='answer-id-1650169' id='answer-label-1650169' class=' answer'><span>Syslog with CSV (Comma-separated Values)<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-13' style=';'><div id='questionWrap-13'  class='   watupro-question-id-426201'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>13. <\/span>You are managing an A1 infrastructure based on NVIDIA Spectrum-X switches. A new application requires strict Quality of Service (QOS) guarantees for its traffic. Specifically, you need to ensure that this application\u2019s traffic receives preferential treatment and minimal latency. <br \/>\r<br>What combination of Spectrum-X features and configurations would be MOST effective in achieving this?<\/div><input type='hidden' name='question_id[]' id='qID_13' value='426201' \/><input type='hidden' id='answerType426201' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426201[]' id='answer-id-1650170' class='answer   answerof-426201 ' value='1650170'   \/><label for='answer-id-1650170' id='answer-label-1650170' class=' answer'><span>Configure DiffServ Code Point (DSCP) marking on the application\u2019s traffic, map these DSCP values to specific traffic classes within the Spectrum-X switch, and configure Weighted Fair Queueing (WFQ) or Strict Priority Queueing on the egress ports.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426201[]' id='answer-id-1650171' class='answer   answerof-426201 ' value='1650171'   \/><label for='answer-id-1650171' id='answer-label-1650171' class=' answer'><span>Increase the MTIJ size on all interfaces to reduce packet fragmentation and overall latency.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426201[]' id='answer-id-1650172' class='answer   answerof-426201 ' value='1650172'   \/><label for='answer-id-1650172' id='answer-label-1650172' class=' answer'><span>Disable Adaptive Routing (AR) to ensure that traffic always takes the shortest path.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426201[]' id='answer-id-1650173' class='answer   answerof-426201 ' value='1650173'   \/><label for='answer-id-1650173' id='answer-label-1650173' class=' answer'><span>Use VLAN tagging to isolate the application\u2019s traffic into a separate virtual network.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426201[]' id='answer-id-1650174' class='answer   answerof-426201 ' value='1650174'   \/><label for='answer-id-1650174' id='answer-label-1650174' class=' answer'><span>Enable broadcast storm protection.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-14' style=';'><div id='questionWrap-14'  class='   watupro-question-id-426202'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>14. <\/span>You are tasked with installing a DGX A100 server. After racking and connecting power and network cables, you power it on, but the BMC (Baseboard Management Controller) is not accessible via the network. You have verified the network cable is connected and the switch port is active. <br \/>\r<br>What are the MOST likely causes and initial troubleshooting steps you should take?<\/div><input type='hidden' name='question_id[]' id='qID_14' value='426202' \/><input type='hidden' id='answerType426202' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426202[]' id='answer-id-1650175' class='answer   answerof-426202 ' value='1650175'   \/><label for='answer-id-1650175' id='answer-label-1650175' class=' answer'><span>The BMC IP address is not configured or is on a different subnet. Check the BMC\u2019s network configuration using the DGX\u2019s front panel or via serial console. Verify DHCP is enabled and functioning or manually configure a static IP address.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426202[]' id='answer-id-1650176' class='answer   answerof-426202 ' value='1650176'   \/><label for='answer-id-1650176' id='answer-label-1650176' class=' answer'><span>The BMC firmware is corrupted and needs to be reflashed using a USB drive. Check the DGX support site for the latest BMC firmware.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426202[]' id='answer-id-1650177' class='answer   answerof-426202 ' value='1650177'   \/><label for='answer-id-1650177' id='answer-label-1650177' class=' answer'><span>The BMC is not powered on because the main power supply is faulty. Verify the power supply LEDs are lit and providing power to the system.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426202[]' id='answer-id-1650178' class='answer   answerof-426202 ' value='1650178'   \/><label for='answer-id-1650178' id='answer-label-1650178' class=' answer'><span>The network switch port is not configured for the correct VLA<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426202[]' id='answer-id-1650179' class='answer   answerof-426202 ' value='1650179'   \/><label for='answer-id-1650179' id='answer-label-1650179' class=' answer'><span>Verify the switch port configuration to ensure it is on the same VLAN as the BM<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426202[]' id='answer-id-1650180' class='answer   answerof-426202 ' value='1650180'   \/><label for='answer-id-1650180' id='answer-label-1650180' class=' answer'><span>The BMC is faulty and needs to be replaced. Contact NVIDIA support for RM<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-15' style=';'><div id='questionWrap-15'  class='   watupro-question-id-426203'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>15. <\/span>A DGX A100 server with dual power supplies reports a critical power event in the BMC logs. One PSU shows a \u2018degraded\u2019 status, while the other appears normal. <br \/>\r<br>What immediate actions should you take to ensure continued operation and prevent data loss?<\/div><input type='hidden' name='question_id[]' id='qID_15' value='426203' \/><input type='hidden' id='answerType426203' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426203[]' id='answer-id-1650181' class='answer   answerof-426203 ' value='1650181'   \/><label for='answer-id-1650181' id='answer-label-1650181' class=' answer'><span>Immediately shut down the server gracefully to prevent further damage to the faulty PSI<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426203[]' id='answer-id-1650182' class='answer   answerof-426203 ' value='1650182'   \/><label for='answer-id-1650182' id='answer-label-1650182' class=' answer'><span>Hot-swap the degraded PSU with a replacement unit.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426203[]' id='answer-id-1650183' class='answer   answerof-426203 ' value='1650183'   \/><label for='answer-id-1650183' id='answer-label-1650183' class=' answer'><span>Monitor the remaining PSU\u2019s load and temperature closely; if stable, continue operation until a scheduled maintenance window.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426203[]' id='answer-id-1650184' class='answer   answerof-426203 ' value='1650184'   \/><label for='answer-id-1650184' id='answer-label-1650184' class=' answer'><span>Reduce the GPU power limit using \u2018nvidia-smi\u2019 to decrease the overall power consumption of the server.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426203[]' id='answer-id-1650185' class='answer   answerof-426203 ' value='1650185'   \/><label for='answer-id-1650185' id='answer-label-1650185' class=' answer'><span>Migrate all workloads to other servers in the cluster to minimize the impact of a potential complete PSU failure.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-16' style=';'><div id='questionWrap-16'  class='   watupro-question-id-426204'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>16. <\/span>You are tasked with troubleshooting a performance bottleneck in a multi-node, multi-GPU deep learning training job utilizing Horovod. <br \/>\r<br>The training loss is decreasing, but the overall training time is significantly longer than expected. <br \/>\r<br>Which of the following monitoring approaches would provide the most insight into the cause of the bottleneck?<\/div><input type='hidden' name='question_id[]' id='qID_16' value='426204' \/><input type='hidden' id='answerType426204' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426204[]' id='answer-id-1650186' class='answer   answerof-426204 ' value='1650186'   \/><label for='answer-id-1650186' id='answer-label-1650186' class=' answer'><span>Using \u2018nvidia-smi\u2019 on each node to monitor GPU utilization and memory usage.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426204[]' id='answer-id-1650187' class='answer   answerof-426204 ' value='1650187'   \/><label for='answer-id-1650187' id='answer-label-1650187' class=' answer'><span>Enabling Horovod\u2019s timeline and profiling features to visualize the communication patterns and identify synchronization bottlenecks.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426204[]' id='answer-id-1650188' class='answer   answerof-426204 ' value='1650188'   \/><label for='answer-id-1650188' id='answer-label-1650188' class=' answer'><span>Monitoring network bandwidth utilization on each node using \u2018iftop\u2019 or \u2018iperf3\u2019<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426204[]' id='answer-id-1650189' class='answer   answerof-426204 ' value='1650189'   \/><label for='answer-id-1650189' id='answer-label-1650189' class=' answer'><span>Analyzing the training loss curve to identify potential issues with the model architecture or hyperparameters.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426204[]' id='answer-id-1650190' class='answer   answerof-426204 ' value='1650190'   \/><label for='answer-id-1650190' id='answer-label-1650190' class=' answer'><span>Using Shtop\u2019 to monitor CPIJ utilization on each node.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-17' style=';'><div id='questionWrap-17'  class='   watupro-question-id-426205'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>17. <\/span>Which command-line tool is typically used to monitor the status and performance of an NVIDIA NVLink&#65533; Switch?<\/div><input type='hidden' name='question_id[]' id='qID_17' value='426205' \/><input type='hidden' id='answerType426205' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426205[]' id='answer-id-1650191' class='answer   answerof-426205 ' value='1650191'   \/><label for='answer-id-1650191' id='answer-label-1650191' class=' answer'><span>nvidia-smi<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426205[]' id='answer-id-1650192' class='answer   answerof-426205 ' value='1650192'   \/><label for='answer-id-1650192' id='answer-label-1650192' class=' answer'><span>nvswitch-cli<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426205[]' id='answer-id-1650193' class='answer   answerof-426205 ' value='1650193'   \/><label for='answer-id-1650193' id='answer-label-1650193' class=' answer'><span>ibstat<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426205[]' id='answer-id-1650194' class='answer   answerof-426205 ' value='1650194'   \/><label for='answer-id-1650194' id='answer-label-1650194' class=' answer'><span>rocminfo<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426205[]' id='answer-id-1650195' class='answer   answerof-426205 ' value='1650195'   \/><label for='answer-id-1650195' id='answer-label-1650195' class=' answer'><span>Ispci<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-18' style=';'><div id='questionWrap-18'  class='   watupro-question-id-426206'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>18. <\/span>You are using NVIDIA Spectrum-X switches in your A1 infrastructure. You observe high latency between two GPU servers during a large distributed training job. After analyzing the switch telemetry, you suspect a suboptimal routing path is contributing to the problem. <br \/>\r<br>Which of the following methods offers the MOST granular control for influencing traffic flow within the Spectrum-X fabric to mitigate this?<\/div><input type='hidden' name='question_id[]' id='qID_18' value='426206' \/><input type='hidden' id='answerType426206' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426206[]' id='answer-id-1650196' class='answer   answerof-426206 ' value='1650196'   \/><label for='answer-id-1650196' id='answer-label-1650196' class=' answer'><span>Adjust the Equal-Cost Multi-Path (ECMP) hashing algorithm globally on all switches.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426206[]' id='answer-id-1650197' class='answer   answerof-426206 ' value='1650197'   \/><label for='answer-id-1650197' id='answer-label-1650197' class=' answer'><span>Configure QOS (Quality of Service) policies to prioritize traffic from the high-latency GPU servers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426206[]' id='answer-id-1650198' class='answer   answerof-426206 ' value='1650198'   \/><label for='answer-id-1650198' id='answer-label-1650198' class=' answer'><span>Implement Adaptive Routing (AR) or Dynamic Load Balancing (DLB) features available on the Spectrum-X switches to dynamically adjust paths based on network conditions.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426206[]' id='answer-id-1650199' class='answer   answerof-426206 ' value='1650199'   \/><label for='answer-id-1650199' id='answer-label-1650199' class=' answer'><span>Manually configure static routes on the Spectrum-X switches to force traffic between the GPU servers along a specific path.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426206[]' id='answer-id-1650200' class='answer   answerof-426206 ' value='1650200'   \/><label for='answer-id-1650200' id='answer-label-1650200' class=' answer'><span>Disable IPv6 to simplify routing decisions.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-19' style=';'><div id='questionWrap-19'  class='   watupro-question-id-426207'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>19. <\/span>You are running a distributed training job on a multi-GPU server. After several hours, the job fails with a NCCL (NVIDIA Collective Communications Library) error. The error message indicates a failure in inter-GPU communication. \u2018nvidia-smi\u2019 shows all GPUs are healthy. <br \/>\r<br>What is the MOST probable cause of this issue?<\/div><input type='hidden' name='question_id[]' id='qID_19' value='426207' \/><input type='hidden' id='answerType426207' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426207[]' id='answer-id-1650201' class='answer   answerof-426207 ' value='1650201'   \/><label for='answer-id-1650201' id='answer-label-1650201' class=' answer'><span>A bug in the NCCL library itself; downgrade to a previous version of NCC<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426207[]' id='answer-id-1650202' class='answer   answerof-426207 ' value='1650202'   \/><label for='answer-id-1650202' id='answer-label-1650202' class=' answer'><span>Incorrect NCCL configuration, such as an invalid network interface or incorrect device affinity settings.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426207[]' id='answer-id-1650203' class='answer   answerof-426207 ' value='1650203'   \/><label for='answer-id-1650203' id='answer-label-1650203' class=' answer'><span>Insufficient inter-GPU bandwidth; reduce the batch size to decrease communication overhead.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426207[]' id='answer-id-1650204' class='answer   answerof-426207 ' value='1650204'   \/><label for='answer-id-1650204' id='answer-label-1650204' class=' answer'><span>A faulty network cable connecting the server to the rest of the cluster.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426207[]' id='answer-id-1650205' class='answer   answerof-426207 ' value='1650205'   \/><label for='answer-id-1650205' id='answer-label-1650205' class=' answer'><span>Driver incompatibility issue between NCCL and the installed NVIDIA driver version.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-20' style=';'><div id='questionWrap-20'  class='   watupro-question-id-426208'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>20. <\/span>When installing a GPU driver on a Linux system that already has a previous driver version installed, what is the recommended procedure to ensure a clean and stable installation?<\/div><input type='hidden' name='question_id[]' id='qID_20' value='426208' \/><input type='hidden' id='answerType426208' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426208[]' id='answer-id-1650206' class='answer   answerof-426208 ' value='1650206'   \/><label for='answer-id-1650206' id='answer-label-1650206' class=' answer'><span>Simply install the new driver package using \u2018apt install\u2019 or \u2018yum install\u2019 without removing the old driver.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426208[]' id='answer-id-1650207' class='answer   answerof-426208 ' value='1650207'   \/><label for='answer-id-1650207' id='answer-label-1650207' class=' answer'><span>Blacklist the nouveau driver, download the CUDA toolkit, and run the installation script with default options.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426208[]' id='answer-id-1650208' class='answer   answerof-426208 ' value='1650208'   \/><label for='answer-id-1650208' id='answer-label-1650208' class=' answer'><span>Purge the existing NVIDIA driver packages using \u2018apt purge nvidia- or \u2018yum remove nvidia- s, reboot the system, and then install the new driver package.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426208[]' id='answer-id-1650209' class='answer   answerof-426208 ' value='1650209'   \/><label for='answer-id-1650209' id='answer-label-1650209' class=' answer'><span>Run \u2018nvidia-uninstall\u2019 if it exists, otherwise manually remove the NVIDIA kernel modules and libraries from \u2018\/lib\/modules\u2019 and \u2018\/usr\/lib\u2019.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426208[]' id='answer-id-1650210' class='answer   answerof-426208 ' value='1650210'   \/><label for='answer-id-1650210' id='answer-label-1650210' class=' answer'><span>Install the new driver using the .run\u2019 file from NVIDIA\u2019s website, accepting all default options.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-21' style=';'><div id='questionWrap-21'  class='   watupro-question-id-426209'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>21. <\/span>You are monitoring a server with 8 GPUs used for deep learning training. You observe that one of the GPUs reports a significantly lower utilization rate compared to the others, even though the workload is designed to distribute evenly. \u2018nvidia-smi\u2019 reports a persistent &quot;XID 13&quot; error for that GPU. <br \/>\r<br>What is the most likely cause?<\/div><input type='hidden' name='question_id[]' id='qID_21' value='426209' \/><input type='hidden' id='answerType426209' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426209[]' id='answer-id-1650211' class='answer   answerof-426209 ' value='1650211'   \/><label for='answer-id-1650211' id='answer-label-1650211' class=' answer'><span>A driver bug causing incorrect workload distribution.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426209[]' id='answer-id-1650212' class='answer   answerof-426209 ' value='1650212'   \/><label for='answer-id-1650212' id='answer-label-1650212' class=' answer'><span>Insufficient system memory preventing data transfer to that GP<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426209[]' id='answer-id-1650213' class='answer   answerof-426209 ' value='1650213'   \/><label for='answer-id-1650213' id='answer-label-1650213' class=' answer'><span>A hardware fault within the GPU, such as a memory error or core failure.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426209[]' id='answer-id-1650214' class='answer   answerof-426209 ' value='1650214'   \/><label for='answer-id-1650214' id='answer-label-1650214' class=' answer'><span>An incorrect CUDA version installed.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426209[]' id='answer-id-1650215' class='answer   answerof-426209 ' value='1650215'   \/><label for='answer-id-1650215' id='answer-label-1650215' class=' answer'><span>The GPU\u2019s compute mode is set to \u2018Exclusive Process\u2019.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-22' style=';'><div id='questionWrap-22'  class='   watupro-question-id-426210'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>22. <\/span>You have a large dataset stored on a BeeGFS file system. The training job is single node and uses data augmentation to generate more data on the fly. The data augmentation process is CPU-bound, but you notice that the GPU is underutilized due to the training data not being fed to the GPU fast enough. <br \/>\r<br>How can you reduce the load on the CPU and improve the overall training throughput?<\/div><input type='hidden' name='question_id[]' id='qID_22' value='426210' \/><input type='hidden' id='answerType426210' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426210[]' id='answer-id-1650216' class='answer   answerof-426210 ' value='1650216'   \/><label for='answer-id-1650216' id='answer-label-1650216' class=' answer'><span>Move the training data to a local NVMe drive on the training node.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426210[]' id='answer-id-1650217' class='answer   answerof-426210 ' value='1650217'   \/><label for='answer-id-1650217' id='answer-label-1650217' class=' answer'><span>Increase the number of BeeGFS metadata servers (MDSs) to improve metadata performance.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426210[]' id='answer-id-1650218' class='answer   answerof-426210 ' value='1650218'   \/><label for='answer-id-1650218' id='answer-label-1650218' class=' answer'><span>Implement asynchronous 1\/0 in the data loading pipeline using a library like NVIDIA DALI to offload data processing tasks from the CPU to the GP<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426210[]' id='answer-id-1650219' class='answer   answerof-426210 ' value='1650219'   \/><label for='answer-id-1650219' id='answer-label-1650219' class=' answer'><span>Decrease the batch size of the training job to reduce the amount of data being processed at each iteration.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426210[]' id='answer-id-1650220' class='answer   answerof-426210 ' value='1650220'   \/><label for='answer-id-1650220' id='answer-label-1650220' class=' answer'><span>Enable data compression on the BeeGFS file system to reduce the amount of data being transferred over the network.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-23' style=';'><div id='questionWrap-23'  class='   watupro-question-id-426211'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>23. <\/span>You are troubleshooting a network performance issue in your NCP-AII environment. <br \/>\r<br>After running \u2018ibstat\u2019 on a host, you see the following output for one of the InfiniBand ports: <br \/>\r<br><br><img decoding=\"async\" width=649 height=8 id=\"\u56fe\u7247 33\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/uploads\/2025\/09\/image001.jpg\"><br><br \/>\r<br>What does the \u2018LMC: 0\u2019 indicate, and what are the implications for network performance?<\/div><input type='hidden' name='question_id[]' id='qID_23' value='426211' \/><input type='hidden' id='answerType426211' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426211[]' id='answer-id-1650221' class='answer   answerof-426211 ' value='1650221'   \/><label for='answer-id-1650221' id='answer-label-1650221' class=' answer'><span>LMC: 0 indicates that the link is down and not functioning correctly.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426211[]' id='answer-id-1650222' class='answer   answerof-426211 ' value='1650222'   \/><label for='answer-id-1650222' id='answer-label-1650222' class=' answer'><span>LMC: 0 indicates that Link Aggregation (LAG) is not enabled on this port, meaning only a single link is being used for communication.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426211[]' id='answer-id-1650223' class='answer   answerof-426211 ' value='1650223'   \/><label for='answer-id-1650223' id='answer-label-1650223' class=' answer'><span>LMC: 0 indicates the port is operating at the lowest possible speed.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426211[]' id='answer-id-1650224' class='answer   answerof-426211 ' value='1650224'   \/><label for='answer-id-1650224' id='answer-label-1650224' class=' answer'><span>LMC: 0 indicates that the Subnet Manager is not running correctly.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426211[]' id='answer-id-1650225' class='answer   answerof-426211 ' value='1650225'   \/><label for='answer-id-1650225' id='answer-label-1650225' class=' answer'><span>LMC: 0 is the default and expected value; it has no impact on performance.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-24' style=';'><div id='questionWrap-24'  class='   watupro-question-id-426212'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>24. <\/span>You are deploying a new A1 cluster using RoCEv2 over a lossless Ethernet fabric. <br \/>\r<br>Which of the following QOS (Quality of Service) mechanisms is critical for ensuring reliable RDMA communication?<\/div><input type='hidden' name='question_id[]' id='qID_24' value='426212' \/><input type='hidden' id='answerType426212' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426212[]' id='answer-id-1650226' class='answer   answerof-426212 ' value='1650226'   \/><label for='answer-id-1650226' id='answer-label-1650226' class=' answer'><span>DSCP (Differentiated Services Code Point) marking<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426212[]' id='answer-id-1650227' class='answer   answerof-426212 ' value='1650227'   \/><label for='answer-id-1650227' id='answer-label-1650227' class=' answer'><span>ECN (Explicit Congestion Notification)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426212[]' id='answer-id-1650228' class='answer   answerof-426212 ' value='1650228'   \/><label for='answer-id-1650228' id='answer-label-1650228' class=' answer'><span>PFC (Priority Flow control)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426212[]' id='answer-id-1650229' class='answer   answerof-426212 ' value='1650229'   \/><label for='answer-id-1650229' id='answer-label-1650229' class=' answer'><span>ACL (Access Control List)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426212[]' id='answer-id-1650230' class='answer   answerof-426212 ' value='1650230'   \/><label for='answer-id-1650230' id='answer-label-1650230' class=' answer'><span>Rate Limiting<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-25' style=';'><div id='questionWrap-25'  class='   watupro-question-id-426213'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>25. <\/span>You are deploying a multi-tenant AI infrastructure where different users or groups have isolated network environments using VXLAN. <br \/>\r<br>Which of the following is the MOST important consideration when configuring the VTEPs (VXLAN Tunnel Endpoints) on the hosts to ensure proper network isolation and performance?<\/div><input type='hidden' name='question_id[]' id='qID_25' value='426213' \/><input type='hidden' id='answerType426213' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426213[]' id='answer-id-1650231' class='answer   answerof-426213 ' value='1650231'   \/><label for='answer-id-1650231' id='answer-label-1650231' class=' answer'><span>Using the default MTU size of 1500 bytes for VXLAN traffic.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426213[]' id='answer-id-1650232' class='answer   answerof-426213 ' value='1650232'   \/><label for='answer-id-1650232' id='answer-label-1650232' class=' answer'><span>Ensuring that each tenant has a unique VXLAN Network Identifier (VNI) to isolate their traffic.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426213[]' id='answer-id-1650233' class='answer   answerof-426213 ' value='1650233'   \/><label for='answer-id-1650233' id='answer-label-1650233' class=' answer'><span>Using the same IP address for all VTEPs to simplify routing.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426213[]' id='answer-id-1650234' class='answer   answerof-426213 ' value='1650234'   \/><label for='answer-id-1650234' id='answer-label-1650234' class=' answer'><span>Disabling multicast routing to prevent broadcast traffic.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426213[]' id='answer-id-1650235' class='answer   answerof-426213 ' value='1650235'   \/><label for='answer-id-1650235' id='answer-label-1650235' class=' answer'><span>Using the same VNI for all tenants to maximize network utilization.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-26' style=';'><div id='questionWrap-26'  class='   watupro-question-id-426214'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>26. <\/span>Consider a scenario where you\u2019re using GPUDirect Storage to enable direct memory access between GPUs and NVMe drives. You observe that while GPUDirect Storage is enabled, you\u2019re not seeing the expected performance gains. <br \/>\r<br>What are potential reasons and configurations you should check to ensure optimal GPUDirect Storage performance? Select all that apply.<\/div><input type='hidden' name='question_id[]' id='qID_26' value='426214' \/><input type='hidden' id='answerType426214' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426214[]' id='answer-id-1650236' class='answer   answerof-426214 ' value='1650236'   \/><label for='answer-id-1650236' id='answer-label-1650236' class=' answer'><span>Verify that the NVMe drives are properly configured in a RAID 0 configuration.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426214[]' id='answer-id-1650237' class='answer   answerof-426214 ' value='1650237'   \/><label for='answer-id-1650237' id='answer-label-1650237' class=' answer'><span>Ensure that the NVMe drives are connected to the system via PCle Gen4 or Gen5.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426214[]' id='answer-id-1650238' class='answer   answerof-426214 ' value='1650238'   \/><label for='answer-id-1650238' id='answer-label-1650238' class=' answer'><span>Confirm that the CUDA driver version is compatible with GPIJDirect Storage.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426214[]' id='answer-id-1650239' class='answer   answerof-426214 ' value='1650239'   \/><label for='answer-id-1650239' id='answer-label-1650239' class=' answer'><span>Check if the file system supports direct I\/O (e.g., using \u2018directio\u2019 mount option).<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426214[]' id='answer-id-1650240' class='answer   answerof-426214 ' value='1650240'   \/><label for='answer-id-1650240' id='answer-label-1650240' class=' answer'><span>Disable CPU-side caching to force all I\/O operations to go directly to the GPU memory.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-27' style=';'><div id='questionWrap-27'  class='   watupro-question-id-426215'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>27. <\/span>Your AI infrastructure includes several NVIDIAAI 00 GPUs. You notice that the GPU memory bandwidth reported by \u2018nvidia-smi\u2019 is significantly lower than the theoretical maximum for all GPUs. System RAM is plentiful and not being heavily utilized. <br \/>\r<br>What are TWO potential bottlenecks that could be causing this performance issue?<\/div><input type='hidden' name='question_id[]' id='qID_27' value='426215' \/><input type='hidden' id='answerType426215' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426215[]' id='answer-id-1650241' class='answer   answerof-426215 ' value='1650241'   \/><label for='answer-id-1650241' id='answer-label-1650241' class=' answer'><span>Insufficient CPU cores assigned to the training process.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426215[]' id='answer-id-1650242' class='answer   answerof-426215 ' value='1650242'   \/><label for='answer-id-1650242' id='answer-label-1650242' class=' answer'><span>Inefficient data loading from storage to GPU memory.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426215[]' id='answer-id-1650243' class='answer   answerof-426215 ' value='1650243'   \/><label for='answer-id-1650243' id='answer-label-1650243' class=' answer'><span>The GPUs are connected via PCle Gen3 instead of PCle Gen4.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426215[]' id='answer-id-1650244' class='answer   answerof-426215 ' value='1650244'   \/><label for='answer-id-1650244' id='answer-label-1650244' class=' answer'><span>The CPU is using older DDR4 memory with low bandwidth<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426215[]' id='answer-id-1650245' class='answer   answerof-426215 ' value='1650245'   \/><label for='answer-id-1650245' id='answer-label-1650245' class=' answer'><span>The NVIDIA drivers are not configured to enable peer-to-peer memory access between GPUs.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-28' style=';'><div id='questionWrap-28'  class='   watupro-question-id-426216'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>28. <\/span>Which of the following statements are true regarding the use of Congestion Management (CM) and Congestion Avoidance (CA) techniques within an InfiniBand fabric using NVIDIA technology? (Select TWO)<\/div><input type='hidden' name='question_id[]' id='qID_28' value='426216' \/><input type='hidden' id='answerType426216' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426216[]' id='answer-id-1650246' class='answer   answerof-426216 ' value='1650246'   \/><label for='answer-id-1650246' id='answer-label-1650246' class=' answer'><span>CM\/CA mechanisms are primarily implemented at the IP layer and are independent of the InfiniBand transport layer.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426216[]' id='answer-id-1650247' class='answer   answerof-426216 ' value='1650247'   \/><label for='answer-id-1650247' id='answer-label-1650247' class=' answer'><span>CM aims to reduce the severity of congestion once it has already occurred, while CA aims to prevent congestion from happening in the first place.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426216[]' id='answer-id-1650248' class='answer   answerof-426216 ' value='1650248'   \/><label for='answer-id-1650248' id='answer-label-1650248' class=' answer'><span>InfiniBand\u2019s Explicit Congestion Notification (ECN) is a CA mechanism that allows switches to signal congestion to endpoints before packet loss occurs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426216[]' id='answer-id-1650249' class='answer   answerof-426216 ' value='1650249'   \/><label for='answer-id-1650249' id='answer-label-1650249' class=' answer'><span>CM\/CA are not relevant in InfiniBand fabrics because InfiniBand\u2019s lossless nature guarantees that no packets will ever be dropped due to congestion.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426216[]' id='answer-id-1650250' class='answer   answerof-426216 ' value='1650250'   \/><label for='answer-id-1650250' id='answer-label-1650250' class=' answer'><span>CM can include techniques like rate limiting to throttle traffic flows when congestion is detected.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-29' style=';'><div id='questionWrap-29'  class='   watupro-question-id-426217'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>29. <\/span>You\u2019ve installed a server with multiple NVIDIAAIOO GPUs intended for use with Kubernetes and NVIDIA\u2019s GPU Operaton After installing the GPU Operator, you notice that the GPUs are not being properly detected and managed by Kubernetes. <br \/>\r<br>Which of the following are potential causes and troubleshooting steps you should take?<\/div><input type='hidden' name='question_id[]' id='qID_29' value='426217' \/><input type='hidden' id='answerType426217' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426217[]' id='answer-id-1650251' class='answer   answerof-426217 ' value='1650251'   \/><label for='answer-id-1650251' id='answer-label-1650251' class=' answer'><span>The NVIDIA drivers are not properly installed on the host operating system before installing the GPU Operator. Verify the driver installation using \u2018nvidia-smr.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426217[]' id='answer-id-1650252' class='answer   answerof-426217 ' value='1650252'   \/><label for='answer-id-1650252' id='answer-label-1650252' class=' answer'><span>The Kubernetes nodes are not labeled correctly to indicate the presence of NVIDIA GPUs. Use \u2018kubectl label node nvidia.com\/gpu.present=true\u2019.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426217[]' id='answer-id-1650253' class='answer   answerof-426217 ' value='1650253'   \/><label for='answer-id-1650253' id='answer-label-1650253' class=' answer'><span>The NVIDIA Container Toolkit is not installed on the Kubernetes nodes. Install the toolkit according to NVIDIA\u2019s documentation.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426217[]' id='answer-id-1650254' class='answer   answerof-426217 ' value='1650254'   \/><label for='answer-id-1650254' id='answer-label-1650254' class=' answer'><span>The GPU Operator\u2019s configuration is incorrect, preventing it from properly discovering and managing the GPUs. Check the GPU Operator\u2019s logs and configuration files.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426217[]' id='answer-id-1650255' class='answer   answerof-426217 ' value='1650255'   \/><label for='answer-id-1650255' id='answer-label-1650255' class=' answer'><span>The \u2018nvidia-docker2 runtime is not set as the default runtime in \u2018\/etc\/docker\/daemon.json\u2019. Change the default runtime to \u2018nvidia\u2019 and restart the Docker daemon.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-30' style=';'><div id='questionWrap-30'  class='   watupro-question-id-426218'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>30. <\/span>You are tasked with configuring an NVIDIA NVLink&#65533; Switch system. After physically connecting the GPUs and the switch, what is the typical first step in the software configuration process?<\/div><input type='hidden' name='question_id[]' id='qID_30' value='426218' \/><input type='hidden' id='answerType426218' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426218[]' id='answer-id-1650256' class='answer   answerof-426218 ' value='1650256'   \/><label for='answer-id-1650256' id='answer-label-1650256' class=' answer'><span>Installing the latest NVIDIA drivers on all connected GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426218[]' id='answer-id-1650257' class='answer   answerof-426218 ' value='1650257'   \/><label for='answer-id-1650257' id='answer-label-1650257' class=' answer'><span>Configuring the system BIOS to enable NVLink support.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426218[]' id='answer-id-1650258' class='answer   answerof-426218 ' value='1650258'   \/><label for='answer-id-1650258' id='answer-label-1650258' class=' answer'><span>Updating the firmware of the NVLink Switch.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426218[]' id='answer-id-1650259' class='answer   answerof-426218 ' value='1650259'   \/><label for='answer-id-1650259' id='answer-label-1650259' class=' answer'><span>Installing the NVLink Switch management software.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426218[]' id='answer-id-1650260' class='answer   answerof-426218 ' value='1650260'   \/><label for='answer-id-1650260' id='answer-label-1650260' class=' answer'><span>Running a memory bandwidth test between all connected GPUs.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-31' style=';'><div id='questionWrap-31'  class='   watupro-question-id-426219'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>31. <\/span>You\u2019re optimizing a deep learning model for deployment on NVIDIA Tensor Cores. The model uses a mix of FP32 and FP16 precision. During profiling with NVIDIA Nsight Systems, you observe that the Tensor Cores are underutilized. <br \/>\r<br>Which of the following strategies would MOST effectively improve Tensor Core utilization?<\/div><input type='hidden' name='question_id[]' id='qID_31' value='426219' \/><input type='hidden' id='answerType426219' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426219[]' id='answer-id-1650261' class='answer   answerof-426219 ' value='1650261'   \/><label for='answer-id-1650261' id='answer-label-1650261' class=' answer'><span>Increase the batch size to fully utilize the available GPU memory.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426219[]' id='answer-id-1650262' class='answer   answerof-426219 ' value='1650262'   \/><label for='answer-id-1650262' id='answer-label-1650262' class=' answer'><span>Ensure that all matrix multiplications are performed using FP16 precision.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426219[]' id='answer-id-1650263' class='answer   answerof-426219 ' value='1650263'   \/><label for='answer-id-1650263' id='answer-label-1650263' class=' answer'><span>Pad the input tensors to dimensions that are multiples of 8 for optimal Tensor Core alignment.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426219[]' id='answer-id-1650264' class='answer   answerof-426219 ' value='1650264'   \/><label for='answer-id-1650264' id='answer-label-1650264' class=' answer'><span>Enable CUDA graph capture to reduce kernel launch overhead.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426219[]' id='answer-id-1650265' class='answer   answerof-426219 ' value='1650265'   \/><label for='answer-id-1650265' id='answer-label-1650265' class=' answer'><span>Decrease the learning rate to improve training stability and reduce the need for gradient clipping.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-32' style=';'><div id='questionWrap-32'  class='   watupro-question-id-426220'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>32. <\/span>You are tasked with ensuring optimal power efficiency for a GPU server running machine learning workloads. You want to dynamically adjust the GPU\u2019s power consumption based on its utilization. <br \/>\r<br>Which of the following methods is the MOST suitable for achieving this, assuming the server\u2019s BIOS and the NVIDIA drivers support it?<\/div><input type='hidden' name='question_id[]' id='qID_32' value='426220' \/><input type='hidden' id='answerType426220' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426220[]' id='answer-id-1650266' class='answer   answerof-426220 ' value='1650266'   \/><label for='answer-id-1650266' id='answer-label-1650266' class=' answer'><span>Manually set the GPU\u2019s power limit using \u2018nvidia-smi -pl and create a script to monitor utilization and adjust the power limit periodically.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426220[]' id='answer-id-1650267' class='answer   answerof-426220 ' value='1650267'   \/><label for='answer-id-1650267' id='answer-label-1650267' class=' answer'><span>Configure the server\u2019s BIOS\/UEFI to use a power-saving profile, which will automatically reduce the GPU\u2019s power consumption when idle.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426220[]' id='answer-id-1650268' class='answer   answerof-426220 ' value='1650268'   \/><label for='answer-id-1650268' id='answer-label-1650268' class=' answer'><span>Enable Dynamic Boost in the NVIDIA Control Panel (if available), which will automatically allocate power between the CPU and GPU based on their current needs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426220[]' id='answer-id-1650269' class='answer   answerof-426220 ' value='1650269'   \/><label for='answer-id-1650269' id='answer-label-1650269' class=' answer'><span>Use NVIDIA\u2019s Data Center GPU Manager (DCGM) to monitor GPU utilization and dynamically adjust the power limit based on a predefined policy.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426220[]' id='answer-id-1650270' class='answer   answerof-426220 ' value='1650270'   \/><label for='answer-id-1650270' id='answer-label-1650270' class=' answer'><span>Disable ECC (Error Correcting Code) on the GPU to reduce power consumption.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-33' style=';'><div id='questionWrap-33'  class='   watupro-question-id-426221'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>33. <\/span>Which protocol is commonly used in Spine-Leaf architectures for dynamic routing and load balancing across multiple paths?<\/div><input type='hidden' name='question_id[]' id='qID_33' value='426221' \/><input type='hidden' id='answerType426221' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426221[]' id='answer-id-1650271' class='answer   answerof-426221 ' value='1650271'   \/><label for='answer-id-1650271' id='answer-label-1650271' class=' answer'><span>STP (Spanning Tree Protocol)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426221[]' id='answer-id-1650272' class='answer   answerof-426221 ' value='1650272'   \/><label for='answer-id-1650272' id='answer-label-1650272' class=' answer'><span>OSPF (Open Shortest Path First)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426221[]' id='answer-id-1650273' class='answer   answerof-426221 ' value='1650273'   \/><label for='answer-id-1650273' id='answer-label-1650273' class=' answer'><span>VRRP (Virtual Router Redundancy Protocol)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426221[]' id='answer-id-1650274' class='answer   answerof-426221 ' value='1650274'   \/><label for='answer-id-1650274' id='answer-label-1650274' class=' answer'><span>ECMP (Equal-Cost Multi-Path)<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426221[]' id='answer-id-1650275' class='answer   answerof-426221 ' value='1650275'   \/><label for='answer-id-1650275' id='answer-label-1650275' class=' answer'><span>BGP (Border Gateway Protocol)<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-34' style=';'><div id='questionWrap-34'  class='   watupro-question-id-426222'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>34. <\/span>Which of the following techniques are effective for improving inter-GPU communication performance in a multi-GPU Intel Xeon server used for distributed deep learning training with NCCL?<\/div><input type='hidden' name='question_id[]' id='qID_34' value='426222' \/><input type='hidden' id='answerType426222' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426222[]' id='answer-id-1650276' class='answer   answerof-426222 ' value='1650276'   \/><label for='answer-id-1650276' id='answer-label-1650276' class=' answer'><span>Enabling PCle peer-to-peer transfers between GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426222[]' id='answer-id-1650277' class='answer   answerof-426222 ' value='1650277'   \/><label for='answer-id-1650277' id='answer-label-1650277' class=' answer'><span>Utilizing InfiniBand or RoCE interconnects if available.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426222[]' id='answer-id-1650278' class='answer   answerof-426222 ' value='1650278'   \/><label for='answer-id-1650278' id='answer-label-1650278' class=' answer'><span>Increasing the system RAM size to minimize data transfer to disk.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426222[]' id='answer-id-1650279' class='answer   answerof-426222 ' value='1650279'   \/><label for='answer-id-1650279' id='answer-label-1650279' class=' answer'><span>Configuring NCCL to use the correct network interface and transport protocol (e.g., 1B, Socket).<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426222[]' id='answer-id-1650280' class='answer   answerof-426222 ' value='1650280'   \/><label for='answer-id-1650280' id='answer-label-1650280' class=' answer'><span>Disabling CPU frequency scaling to maintain consistent performance.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-35' style=';'><div id='questionWrap-35'  class='   watupro-question-id-426223'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>35. <\/span>An AI server with 8 GPUs is experiencing random system crashes under heavy load. The system logs indicate potential memory errors, but standard memory tests (memtest86+) pass without any failures. The GPUs are passively cooled. <br \/>\r<br>What are the THREE most likely root causes of these crashes?<\/div><input type='hidden' name='question_id[]' id='qID_35' value='426223' \/><input type='hidden' id='answerType426223' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426223[]' id='answer-id-1650281' class='answer   answerof-426223 ' value='1650281'   \/><label for='answer-id-1650281' id='answer-label-1650281' class=' answer'><span>Incompatible NVIDIA driver version with the installed Linux kernel.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426223[]' id='answer-id-1650282' class='answer   answerof-426223 ' value='1650282'   \/><label for='answer-id-1650282' id='answer-label-1650282' class=' answer'><span>GPIJ memory errors that are not detectable by standard CPU-based memory tests.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426223[]' id='answer-id-1650283' class='answer   answerof-426223 ' value='1650283'   \/><label for='answer-id-1650283' id='answer-label-1650283' class=' answer'><span>Insufficient airflow within the server, leading to overheating of the GPUs and VRMs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426223[]' id='answer-id-1650284' class='answer   answerof-426223 ' value='1650284'   \/><label for='answer-id-1650284' id='answer-label-1650284' class=' answer'><span>A faulty power supply unit (PSU) that is unable to provide stable power under peak load.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426223[]' id='answer-id-1650285' class='answer   answerof-426223 ' value='1650285'   \/><label for='answer-id-1650285' id='answer-label-1650285' class=' answer'><span>Network congestion causing intermittent data corruption during distributed training.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-36' style=';'><div id='questionWrap-36'  class='   watupro-question-id-426224'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>36. <\/span>Which of the following are key benefits of using NVIDIA Spectrum-X switches in an A1 infrastructure compared to traditional Ethernet switches? (Select THREE)<\/div><input type='hidden' name='question_id[]' id='qID_36' value='426224' \/><input type='hidden' id='answerType426224' value='checkbox'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426224[]' id='answer-id-1650286' class='answer   answerof-426224 ' value='1650286'   \/><label for='answer-id-1650286' id='answer-label-1650286' class=' answer'><span>Lower cost per port.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426224[]' id='answer-id-1650287' class='answer   answerof-426224 ' value='1650287'   \/><label for='answer-id-1650287' id='answer-label-1650287' class=' answer'><span>Support for RoCE (RDMA over Converged Ethernet) and InfiniBand protocols, enabling high-bandwidth, low-latency communication.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426224[]' id='answer-id-1650288' class='answer   answerof-426224 ' value='1650288'   \/><label for='answer-id-1650288' id='answer-label-1650288' class=' answer'><span>Advanced telemetry and monitoring capabilities for network performance optimization.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426224[]' id='answer-id-1650289' class='answer   answerof-426224 ' value='1650289'   \/><label for='answer-id-1650289' id='answer-label-1650289' class=' answer'><span>Hardware-based acceleration for collective communication operations used in distributed A1 training.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='checkbox' name='answer-426224[]' id='answer-id-1650290' class='answer   answerof-426224 ' value='1650290'   \/><label for='answer-id-1650290' id='answer-label-1650290' class=' answer'><span>Native support for IPv6.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-37' style=';'><div id='questionWrap-37'  class='   watupro-question-id-426225'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>37. <\/span>A critical AI model training job consistently fails on a specific GPU server in your cluster after running for approximately 24 hours. <br \/>\r<br>Monitoring data shows a sudden drop in GPU power consumption followed by a system reboot. All other GPUs on the server appear normal. The server has redundant PSUs. <br \/>\r<br>What is the MOST likely cause?<\/div><input type='hidden' name='question_id[]' id='qID_37' value='426225' \/><input type='hidden' id='answerType426225' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426225[]' id='answer-id-1650291' class='answer   answerof-426225 ' value='1650291'   \/><label for='answer-id-1650291' id='answer-label-1650291' class=' answer'><span>A software bug in the A1 model causing a kernel panic specifically triggered after 24 hours of execution.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426225[]' id='answer-id-1650292' class='answer   answerof-426225 ' value='1650292'   \/><label for='answer-id-1650292' id='answer-label-1650292' class=' answer'><span>Thermal runaway on the GPU due to a failing thermal interface material (TIM) between the GPU die and the heatsink.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426225[]' id='answer-id-1650293' class='answer   answerof-426225 ' value='1650293'   \/><label for='answer-id-1650293' id='answer-label-1650293' class=' answer'><span>A transient power supply issue affecting only one of the redundant PSUs, triggering a system-wide protection mechanism.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426225[]' id='answer-id-1650294' class='answer   answerof-426225 ' value='1650294'   \/><label for='answer-id-1650294' id='answer-label-1650294' class=' answer'><span>ECC memory errors accumulating over time, eventually leading to a non-recoverable system fault.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426225[]' id='answer-id-1650295' class='answer   answerof-426225 ' value='1650295'   \/><label for='answer-id-1650295' id='answer-label-1650295' class=' answer'><span>A driver crash, causing the GPU to reset and the system to reboot.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-38' style=';'><div id='questionWrap-38'  class='   watupro-question-id-426226'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>38. <\/span>After replacing a GPU in a multi-GPU server, you notice that the new GPU is consistently running at a lower clock speed than the other GPUs, even under load. *nvidia-smi\u2019 shows the \u2018Pwr\u2019 state as \u2018P8\u2019 for the new GPU, while the others are at \u2018PO\u2019. <br \/>\r<br>What is the MOST probable cause?<\/div><input type='hidden' name='question_id[]' id='qID_38' value='426226' \/><input type='hidden' id='answerType426226' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426226[]' id='answer-id-1650296' class='answer   answerof-426226 ' value='1650296'   \/><label for='answer-id-1650296' id='answer-label-1650296' class=' answer'><span>The new GPU is a lower-performance model than the other GPUs.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426226[]' id='answer-id-1650297' class='answer   answerof-426226 ' value='1650297'   \/><label for='answer-id-1650297' id='answer-label-1650297' class=' answer'><span>The driver is not properly recognizing the new GPU\u2019s capabilities; reinstall the driver.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426226[]' id='answer-id-1650298' class='answer   answerof-426226 ' value='1650298'   \/><label for='answer-id-1650298' id='answer-label-1650298' class=' answer'><span>The new GPU is not receiving sufficient power; check the power connections and PSU capacity.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426226[]' id='answer-id-1650299' class='answer   answerof-426226 ' value='1650299'   \/><label for='answer-id-1650299' id='answer-label-1650299' class=' answer'><span>The new GPU is overheating and throttling performance.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426226[]' id='answer-id-1650300' class='answer   answerof-426226 ' value='1650300'   \/><label for='answer-id-1650300' id='answer-label-1650300' class=' answer'><span>The new GPU requires a firmware update that hasn\u2019t been applied.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-39' style=';'><div id='questionWrap-39'  class='   watupro-question-id-426227'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>39. <\/span>You are configuring a network for a distributed training job using multiple DGX servers connected via InfiniBand. After launching the training job, you observe that the inter-GPU communication is significantly slower than expected, even though \u2018ibstat\u2019 shows all links are up and active. <br \/>\r<br>What is the MOST likely cause of this performance bottleneck?<\/div><input type='hidden' name='question_id[]' id='qID_39' value='426227' \/><input type='hidden' id='answerType426227' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426227[]' id='answer-id-1650301' class='answer   answerof-426227 ' value='1650301'   \/><label for='answer-id-1650301' id='answer-label-1650301' class=' answer'><span>The default MTU size of 1500 is too small for efficient large data transfers.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426227[]' id='answer-id-1650302' class='answer   answerof-426227 ' value='1650302'   \/><label for='answer-id-1650302' id='answer-label-1650302' class=' answer'><span>Incorrect placement of GPUs across NUMA nodes, leading to increased inter-node latency.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426227[]' id='answer-id-1650303' class='answer   answerof-426227 ' value='1650303'   \/><label for='answer-id-1650303' id='answer-label-1650303' class=' answer'><span>The CPU frequency scaling governor is set to \u2018powersave\u2019, limiting CPU performance.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426227[]' id='answer-id-1650304' class='answer   answerof-426227 ' value='1650304'   \/><label for='answer-id-1650304' id='answer-label-1650304' class=' answer'><span>The InfiniBand subnet manager (SM) is configured incorrectly or experiencing performance issues (e.g., path selection is suboptimal, congestion control is not enabled).<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426227[]' id='answer-id-1650305' class='answer   answerof-426227 ' value='1650305'   \/><label for='answer-id-1650305' id='answer-label-1650305' class=' answer'><span>The RDMA memory registration limit is too low, causing frequent memory registration and unregistration overhead.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div class='watu-question ' id='question-40' style=';'><div id='questionWrap-40'  class='   watupro-question-id-426228'>\n\t\t\t<div class='question-content'><div><span class='watupro_num'>40. <\/span>You are configuring an InfiniBand subnet with multiple switches. You need to ensure that traffic between two specific nodes always takes the shortest path, bypassing a potentially congested link. <br \/>\r<br>Which of the following approaches is MOST effective for achieving this using InfiniBand\u2019s routing capabilities?<\/div><input type='hidden' name='question_id[]' id='qID_40' value='426228' \/><input type='hidden' id='answerType426228' value='radio'><!-- end question-content--><\/div><div class='question-choices watupro-choices-columns '><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426228[]' id='answer-id-1650306' class='answer   answerof-426228 ' value='1650306'   \/><label for='answer-id-1650306' id='answer-label-1650306' class=' answer'><span>Rely solely on the Subnet Manager\u2019s (SM) default path computation algorithm (e.g., Min Hop) without any modifications.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426228[]' id='answer-id-1650307' class='answer   answerof-426228 ' value='1650307'   \/><label for='answer-id-1650307' id='answer-label-1650307' class=' answer'><span>Use static routing by manually configuring forwarding tables on each switch along the desired path. This involves specifying DLID-to-Port mappings.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426228[]' id='answer-id-1650308' class='answer   answerof-426228 ' value='1650308'   \/><label for='answer-id-1650308' id='answer-label-1650308' class=' answer'><span>Implement Quality of Service (QOS) to prioritize the traffic between the two nodes, hoping that this will influence the path selection.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426228[]' id='answer-id-1650309' class='answer   answerof-426228 ' value='1650309'   \/><label for='answer-id-1650309' id='answer-label-1650309' class=' answer'><span>Utilize the ibroute command or similar tool to inject a static route between the nodes, forcing traffic to follow a specific path identified by LID and port number.<\/span><\/label><\/div><div class='watupro-question-choice  ' dir='auto' ><input type='radio' name='answer-426228[]' id='answer-id-1650310' class='answer   answerof-426228 ' value='1650310'   \/><label for='answer-id-1650310' id='answer-label-1650310' class=' answer'><span>Decrease the MTIJ size on the potential congested link.<\/span><\/label><\/div><!-- end question-choices--><\/div><!-- end questionWrap--><\/div><\/div><div style='display:none' id='question-41'>\n\t<div class='question-content'>\n\t\t<img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/plugins\/watupro\/img\/loading.gif\" width=\"16\" height=\"16\" alt=\"Loading...\" title=\"Loading...\" \/>&nbsp;Loading...\t<\/div>\n<\/div>\n\n<br \/>\n\t\n\t\t\t<div class=\"watupro_buttons flex \" id=\"watuPROButtons10795\" >\n\t\t  <div id=\"prev-question\" style=\"display:none;\"><input type=\"button\" value=\"&lt; Previous\" onclick=\"WatuPRO.nextQuestion(event, 'previous');\"\/><\/div>\t\t  \t\t  \t\t   \n\t\t   \t  \t\t<div><input type=\"button\" name=\"action\" class=\"watupro-submit-button\" onclick=\"WatuPRO.submitResult(event)\" id=\"action-button\" value=\"View Results\"  \/>\n\t\t<\/div>\n\t\t<\/div>\n\t\t\n\t<input type=\"hidden\" name=\"quiz_id\" value=\"10795\" id=\"watuPROExamID\"\/>\n\t<input type=\"hidden\" name=\"start_time\" id=\"startTime\" value=\"2026-04-21 13:22:50\" \/>\n\t<input type=\"hidden\" name=\"start_timestamp\" id=\"startTimeStamp\" value=\"1776777770\" \/>\n\t<input type=\"hidden\" name=\"question_ids\" value=\"\" \/>\n\t<input type=\"hidden\" name=\"watupro_questions\" value=\"426189:1650111,1650112,1650113,1650114,1650115 | 426190:1650116,1650117,1650118,1650119,1650120 | 426191:1650121,1650122,1650123,1650124,1650125 | 426192:1650126,1650127,1650128,1650129,1650130 | 426193:1650131,1650132,1650133,1650134,1650135 | 426194:1650136,1650137,1650138,1650139,1650140 | 426195:1650141,1650142,1650143,1650144 | 426196:1650145,1650146,1650147,1650148,1650149 | 426197:1650150,1650151,1650152,1650153,1650154 | 426198:1650155,1650156,1650157,1650158,1650159 | 426199:1650160,1650161,1650162,1650163,1650164 | 426200:1650165,1650166,1650167,1650168,1650169 | 426201:1650170,1650171,1650172,1650173,1650174 | 426202:1650175,1650176,1650177,1650178,1650179,1650180 | 426203:1650181,1650182,1650183,1650184,1650185 | 426204:1650186,1650187,1650188,1650189,1650190 | 426205:1650191,1650192,1650193,1650194,1650195 | 426206:1650196,1650197,1650198,1650199,1650200 | 426207:1650201,1650202,1650203,1650204,1650205 | 426208:1650206,1650207,1650208,1650209,1650210 | 426209:1650211,1650212,1650213,1650214,1650215 | 426210:1650216,1650217,1650218,1650219,1650220 | 426211:1650221,1650222,1650223,1650224,1650225 | 426212:1650226,1650227,1650228,1650229,1650230 | 426213:1650231,1650232,1650233,1650234,1650235 | 426214:1650236,1650237,1650238,1650239,1650240 | 426215:1650241,1650242,1650243,1650244,1650245 | 426216:1650246,1650247,1650248,1650249,1650250 | 426217:1650251,1650252,1650253,1650254,1650255 | 426218:1650256,1650257,1650258,1650259,1650260 | 426219:1650261,1650262,1650263,1650264,1650265 | 426220:1650266,1650267,1650268,1650269,1650270 | 426221:1650271,1650272,1650273,1650274,1650275 | 426222:1650276,1650277,1650278,1650279,1650280 | 426223:1650281,1650282,1650283,1650284,1650285 | 426224:1650286,1650287,1650288,1650289,1650290 | 426225:1650291,1650292,1650293,1650294,1650295 | 426226:1650296,1650297,1650298,1650299,1650300 | 426227:1650301,1650302,1650303,1650304,1650305 | 426228:1650306,1650307,1650308,1650309,1650310\" \/>\n\t<input type=\"hidden\" name=\"no_ajax\" value=\"0\">\t\t\t<\/form>\n\t<p>&nbsp;<\/p>\n<\/div>\n\n<script type=\"text\/javascript\">\n\/\/jQuery(document).ready(function(){\ndocument.addEventListener(\"DOMContentLoaded\", function(event) { \t\nvar question_ids = \"426189,426190,426191,426192,426193,426194,426195,426196,426197,426198,426199,426200,426201,426202,426203,426204,426205,426206,426207,426208,426209,426210,426211,426212,426213,426214,426215,426216,426217,426218,426219,426220,426221,426222,426223,426224,426225,426226,426227,426228\";\nWatuPROSettings[10795] = {};\nWatuPRO.qArr = question_ids.split(',');\nWatuPRO.exam_id = 10795;\t    \nWatuPRO.post_id = 111616;\nWatuPRO.store_progress = 0;\nWatuPRO.curCatPage = 1;\nWatuPRO.requiredIDs=\"0\".split(\",\");\nWatuPRO.hAppID = \"0.56716300 1776777770\";\nvar url = \"https:\/\/www.dumpsbase.com\/freedumps\/wp-content\/plugins\/watupro\/show_exam.php\";\nWatuPRO.examMode = 1;\nWatuPRO.siteURL=\"https:\/\/www.dumpsbase.com\/freedumps\/wp-admin\/admin-ajax.php\";\nWatuPRO.emailIsNotRequired = 0;\nWatuPROIntel.init(10795);\nWatuPRO.inCategoryPages=1;});    \t \n<\/script>\n","protected":false},"excerpt":{"rendered":"<p>How to complete your NVIDIA Certified Professional AI Infrastructure (NCP-AII) certification exam quickly and smoothly? You can choose the NCP-AII dumps (V8.02) and study all the latest exam questions and answers now. With DumpsBase\u2019s NCP-AII exam dumps, passing your NVIDIA NCP-AII certification exam can be more seamless and more feasible than you ever envisioned. Before [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18718,18913],"tags":[19981,19781],"class_list":["post-111616","post","type-post","status-publish","format-standard","hentry","category-nvidia","category-nvidia-certified-professional","tag-ncp-aii-practice-test-questions","tag-nvidia-certified-professional-ai-infrastructure-ncp-aii"],"_links":{"self":[{"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/posts\/111616","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/comments?post=111616"}],"version-history":[{"count":1,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/posts\/111616\/revisions"}],"predecessor-version":[{"id":111617,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/posts\/111616\/revisions\/111617"}],"wp:attachment":[{"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/media?parent=111616"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/categories?post=111616"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dumpsbase.com\/freedumps\/wp-json\/wp\/v2\/tags?post=111616"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}