CKRM Based Memory Management

Introduction

With the great achievement on performance, scalability, availability, and security, Linux has been one of major choices of operating systems for enterprise servers. One key aspect for enterprise servers is that the support for running multiple workloads simultaneously. So it is essential for the administrators to balance the resource needs of each workload with the often conflicting system utilization. Class based kernel resource management (CKRM) allows system administrators to provide differentiated service at a user of job level. Processes are grouped together into classes which are granted certain shares of the system resources such as CPU, memory, I/O and network.

This project addresses the class based physical memory management. Processes are classified into classes based on predefined rules and policies, and/or application tags. Then the memory usage information is maintained at the class level. The workload management module (eWLM) monitors the performance of all workloads and decides how many shares of system resource to grant to each class through the class management module. The VM subsystem acquires this share information and enforces the memory allocation and reclamation based on the memory shares.

Figure 1 shows the modification we applied on Linux memory management to provide the class based memory control support. Classes are defined based on processes. In the simple case, each process is associated with a class. The rules and policies to classify processes can be based on pathname, uid, or application tags etc. For complicate case such as address spaces shared between classes, we statically associate the address space with one class. This can be improved later by including hierarchy class structure and is still under development.

The implementation includes the functionality of setting the classes of processes through system call or proc entries or pathname, bean-counting of physical page usage per class, and enforced class based page reclamation. In the page reclamation part, we classify all classes into two categories, under-share and over-share. The arbitrator chooses all over-shared classes as victims and reclaims pages from those classes.

Figure 1: Class-based memory share control for page reclaiming.

Prototype

We have implemented a prototype on Linux 2.5.69. Try the patch here. This a develop version which includes many test code and obsolete code. A clean patch will be posted later.

In this simple version, class share can be specified manually writing to a proc entry /proc/memdiff/classdef. Simply displaying the content will show the current class share definition:

Example: 2 10000 0 0 1 0 0 0 0 12000 8888 1234

The four elements of each row in the displaying order are the class index, memory share by maximum limit of the number of physical pages, current owned pages in the active list, and current owned pages in the inactive list. To modify or add new classes, just simply echo $index $share to the proc entry. Please note the new class must have index of the current maximum index plus 1.

Example: echo 0 24000 > /proc/memdiff/classdef echo 3 24000 > /proc/memdiff/classdef

Currently the classes of processes are specified by their pathnames. A process whose path name is in /classN/… is classified to the class with index N. all process whose is not in those folders inherit their parents’ classes or are assigned to the default class 0 if no parents.

Experiments

We tested the prototype use both artificial memory workload and some real benchmarks. The real benchmark we use is the 173.applu SPEC CPU2000 Benchmark

The artificial memory access simulator simulates memory workload according to an exponential probability distribution of memory access. The probability distribution is given by the following function:

P(i) = ( (exp(-lambda/N)-1) / (1-exp(-lambda)) ) * exp(-lambda*i/N)

Where P(i) is the probability to access the ith page and N is the total number of pages.

The mapping from random number to accessed page index is given by

page_index = log( 1 – rand()*( 1-exp(-lambda) ) )*N / (-lambda)

The following slides present the basic results of our experiments.