Minh Pham; Hao Li; Yongke Yuan; Chengcheng Mou; Kandethody Ramachandran; Zichen Xu; Yicheng Tu
Science Published: (Dec/2021)
Massively parallel systems, such as Graphics Pro- cessing Units (GPUs), are becoming increasingly important in today’s data-intensive computing environments. Due to the high level of parallelism, there are unique challenges in developing system software on massively parallel hardware to efficiently support a large number of parallel threads. One such challenge is designing a dynamic memory allocator whose task is to allocate memory chunks to requesting threads at runtime. A traditional design of a memory allocator involves maintaining a global data structure, such as a list of free pages. However, the centralized data structure can easily become a bottleneck in a massively parallel system. The bottleneck still exists when multiple queues are maintained, as done in state-of-the-art GPU memory allocation solutions. In this paper, we present a novel approach for designing dynamic memory allocation without a centralized data structure. At runtime, the threads follow a random search procedure to locate free pages. We develop mathematical models to demonstrate that our methods achieve asymptotically lower latency than the traditional queue- based design. Extensive experiments show consistency to our mathematical models and demonstrate that our solutions can achieve up to two orders of magnitude improvement in latency over the best-known existing solutions.