Author: Clark Li, Country Manager of KAYTUS for the DACH region. Clark has over 20 years of experience in IT industry, specializing in HPC, AI, cloud and enterprise IT solutions in the last 10 years.
Industry, politics and consumers all understand that energy must be saved in order to reach their sustainability goals. However, the question is how this can best be achieved. IT infrastructure provider KAYTUS shows how to meet these ambitious energy-saving targets via green computing.
In recent years, generative AI applications, exemplified by diverse LLMs like GPT-4, Gemini, and Llama, have been developing rapidly. The performance of Graphic Processing Units (GPU) has increased by 1000X over the eight years since the Pascal to Blackwell GPU. With the advancement of AI applications, there is increased attention on the skyrocketing energy consumption of the servers and data centers. According to the report “The AI Disruption: Challenges and Guidance for Data Center Design”, AI workloads will consume 14-18.7 GW of power in 2028. This has led to higher demands for green data centers. For example, Microsoft has committed to be carbon negative by 2030. But the company has increased CO2 emissions by nearly 30% since 2020 thanks to AI. Microsoft is implementing low-power server states, which has enabled up to 25% reduction in energy usage.
Liquid cooling is not the only solution for sustainable IT
Liquid cooling is in high demand especially with the widespread use of AI, but it’s not a versatile and appropriate solution for green computing of all scenarios. Despite the popularity of liquid cooling, green computing goes beyond it. Green computing is a systematic optimization process. Architecture design, components, and the overall system should be optimized according to application needs. Servers could provide better energy efficiency through systematic optimization, meeting different application scenarios such as cloud, virtualization, database, HPC, etc., no matter using air cooling or liquid cooling. For example, optimized air cooling can fulfill the heat dissipation requirements of CPU with 500W TDP, and even NVLink GPU with 700W TDP. So what is green computing all about?
The key to green computing
Generally speaking, green computing is about the continual optimization and upgrading of data center architecture to enhance the energy efficiency of computing power from generation to transmission to application. The energy efficiency of the computing system should be measured from individual components to the entire system, and to upper-layer applications.
For server manufacturers, achieving a holistic system optimization—from hardware and software components to the whole system—is crucial. This comprehensive approach ensures minimal environmental impact while maximizing the application of generated computing power, thereby reducing additional energy losses. As a result, end users are empowered to choose the most suitable green computing platform. This selection aligns with their application needs, operational efficiency goals, and complies with stringent carbon emission regulations, making it an environmentally and economically sound decision.
The systematic optimization from component, software, to the whole system
Based on the green computing concept, how could air-cooled systems meet the needs of applications with ultimate cooling requirements? An excellent air-cooled system should apply high-quality and environmentally friendly design and optimization from the aspects of component, software and whole system. Let's take a deep look at the methods.
At the component level, the structural design of components including fans, air ducts, and radiators can be optimized to improve heat dissipation efficiency. The positive effects are:
■ More airflow: For efficient air cooling, more air flow volume means more coolants to take away the heat. Some key factors to increase the air volume flowing through the server include the angle of attack of the fan blade and the front and rear window opening area.
■ More stable cooling: The efficiency of the cooling process, given a set energy expenditure, is directly proportional to the smoothness of the airflow. To achieve this, stringent demands are placed on the design of low-resistance air duct. This involves sophisticated engineering of the waveguide vents and backplanes, meticulously crafted to mitigate internal turbulence and ensure a streamlined flow of air. Such designs are pivotal in optimizing thermal management, contributing to a more effective and energy-efficient cooling system.
■ More innovative heat sinks: Identifying and optimizing the heat sink can directly enhance cooling efficiency. For example, increasing the fin surface area of the heat sink can meet the air-cooling requirements for a CPU with 500W TDP. A siphon heat sink, utilizing phase change for efficient heat conduction, can improve heat dissipation by 15% and reduce power consumption by 10%, meeting the cooling needs of a 1U two-socket server with a 350W processor. An EVAC (Enhanced Volume Air Cool) heatsink could reduce the CPU under full load from 85°C to 75°C, comparing to the one with the standard heatsink.
■ More efficient power supply: Server manufacturers can select new materials for batteries to enhance efficiency and stability. For example, the GaN Power Supply Unit (PSU) is now considered as an optimal choice for its high switching frequencies. For example, GaN-based 3.2 KW Titanium PSUs can deliver a power density of 100 W/in³ significantly reducing energy consumption.
At the software level, energy-saving measures can be taken through intelligent and fine-grained controls of the components, including power supply and fan speed. This approach enables improvements like:
■ Intelligent power control: The intelligent management of power supply to hard disks is enhanced through the utilization of Complex Programmable Logic Devices (CPLD). This advanced system enables selective limitation of system throughput to specific disks, facilitating the transition of other hard drives into a dormant state. By implementing disk-level management, this approach can potentially conserve up to 70% of power consumption, marking a significant advancement in energy efficiency for data storage systems.
■ Refined cooling strategy: Different workloads require different cooling strategies. Apart from defining different cooling strategies, server manufacturers can implement sensors inside of servers to measure the real-time temperature information at different locations in order to achieve a dynamic and intelligent control of fans.
At the whole system level, the rack scale server solution is a promising solution that can help further reduce energy costs and overall operation expenses by centralizing power supply, cooling, and management for the whole rack. The computing density of the rack scale server solution is increased by 100%, and the purchasing cost is reduced by 80%.
■ With shared fans, cooling efficiency can be largely enhanced. Besides the more efficient fan controls, larger fans, such as the ones larger than 2U, can be used.
■ By pooling the power supply, the rack scale server solution can save 2.5 million kilowatt-hours of electricity each year on a scale of a thousand units.
Conclusion
Through systematic optimization, servers can now attain peak cooling performance. Let us consider the cutting-edge, multi-node K22V2 server as a prime example. This innovative server employs a 2U half-width architecture that places nodes in a horizontal arrangement, a pioneering approach in design. This energy-efficient layout endows the server with a cooling efficiency that outperforms two standard 1U rack servers by 40% under identical computing conditions, and it also achieves an up to 8% reduction in power usage.
In summary, green computing is fundamentally about a holistic optimization approach that encompasses components, system software, and applications. By strategically enhancing energy efficiency across the systems, servers-not merely as computational workhorses but as the very bedrock of IT infrastructure-can pave the way for a sustainable and eco-conscious future in information technology.
For more information on the topic of sustainable IT infrastructure for data centers, visit: https://www.kaytus.com/