Submitted By: –
The memory hierarchy of high performance and embedded processors has been shown to be one of the major energy consumers. Extrapolating the current trends, this portion is likely to be increased in the near future. In this paper, a technique is proposed which uses an additional mini cache, called the L0-cache, located between the I-cache and the CPU core. This mechanism can provide the instruction stream to the data path, and when managed properly, it can efficiently eliminate the need for high utilization of the more expensive I-cache.
Cache memories are accounting for an increasing fraction of a chip’s transistors and overall energy dissipation. Current proposals for resizable caches fundamentally vary in two design aspects: (1) cache organization, where one organization, referred to as selective-ways, varies the cache’s set- associativity , while the other, referred to as selective-sets , varies the number of cache sets, and (2) resizing strategy, where one proposal statically sets the cache size prior to an application’s execution, while the other allows for dynamic resizing both across and within applications.
Five techniques are proposed and evaluated which are used to the dynamic analysis of the program instruction access behavior and to proactively guide the L0- cache. The basic idea is that only the most frequently executed portion of the code should be stored in the L0-cache, since this is where the program spends most of its time. Results of the experiments indicate that more than 60% of the dissipated energy in the I-cache subsystem can be saved.
Cache Memory improve overall performance by bridging the speed gap between the CPU and Main Memory. However cache can consume a significant amount of processing power (almost 43% for Strong Arm-1) . Many energy efficient cache architectures are proposed. However most published articles focus on various cache parameters in single core architectures and there is no suitable solution for multi-core cache memory subsystems. A key trade-off between the complexity of caches and power consumption also exists. Although cache memory improves the overall performance, the cache consumes significant amount of additional power .Excessive power consumption may defeat the performance gain of multi core embedded system as embedded systems suffer from limited power supply.
Typically, a cache is a special purpose high speed buffer interposed between a processor and a larger but slower speed data memory or data storage device. If data is stored in the cache, then it can be accessed (termed a "hit") in a shorter amount of time by the central processing unit (CPU) than if it were stored in the larger but slower data storage device. If data is not in the cache memory when referenced (termed a "miss"), then access must be made to the slower speed memory or storage device. When this occurs, the cache management software selects that portion of the cache memory containing stale data that has not been requested in a relatively long period of time and replaces it with the data just accessed from the slower data storage device.
Sophisticated cache memory schemes typically select and store data in the cache in anticipation of future use by the CPU to minimize the number of slow accesses to the slower data storage devices. As a result of the combination of the cache memory's speed, size and caching scheme, over 85% of the data requests may be performed with zero wait times. A typical cache memory has its own memory controller. When the CPU sends out a request for data, the cache memory controller intercepts it and checks the cache memory to see if the information is there so that it can be immediately accessed (a "hit"). If the information is not there (a "miss"), the cache memory controller reads the data from the slower system memory. The information is then copied into cache memory as it is sent to the processor so that all subsequent requests for that information can be read from cache memory, with zero wait times.
The advantage of caching arises from the tendency of applications to make repeated references to the same data. This clustering of data is commonly termed "locality of referencing". Performance measures include the ratio of the number of hits or misses to the total number of input/output (I/O) references. These are denominated the "hit ratio" or "miss ratio", respectively. In the past (the 1970's), hit and miss ratios were only determined from reference tables, which limited their utility to static tuning of storage devices.
In microcomputers, caches are used in two places. First, CPU chips contain separate data and instruction caches within them that lies between the CPU itself and the dynamic memory external to the chip. This creates an apparent speed up of the physical memory. Second, caching schemes are used between the microcomputer and its external mass storage devices, especially disk drives. This creates an apparent speed up of the external mass storage device. These cache schemes are referred to as "disk caches".
For example, there are many types of disk caching schemes implemented for the real mode Disk Operating System (DOS) environment. In addition, such a cache is implemented in the protected mode environment of Windows® 3.11, using memory under the control of Windows®, and consists of several components, including separate cache logic and I/O logic. The memory manager component of the Microsoft Window® operating system, version 3.1x, is incorporated into the Virtual Machine Manager (VMM) virtual device. The VMM virtual device is responsible for managing the physical memory resources (i.e., RAM) of the computer system, which comprises all available physical memory at the time the Windows® program is started. Some of this physical memory can be devoted to implementing a cache system.
Chapter 1 :INTRODUCTION OF DYNAMIC CACHE MANAGEMENT TECHNIQUE
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere. If requested data is contained in the cache (cache hit), this request can be served by simply reading the cache, which is comparatively faster. Otherwise (cache miss), the data has to be recomputed or fetched from its original storage location, which is comparatively slower. Hence, the more requests can be served from the cache the faster the overall system performance is.
To be cost efficient and to enable an efficient use of data, caches are relatively small. Nevertheless, caches have proven themselves in many areas of computing because access patterns in typical computer applications have locality of reference. References exhibit temporal locality if data is requested again that has been recently requested already. References exhibit spatial locality if data is requested that is physically stored close to data that has been requested already.
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. As long as most memory accesses are cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory.When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor immediately reads from or writes to the cache, which is much faster than reading from or writing to main memory.Most modern desktop and server CPUs have at least three independent caches: an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a translation lookaside buffer (TLB) used to speed up virtual-to-physical address translation for both executable instructions and data.
Dynamic cache is intelligent micro cache using patterns that allow a user to configure cache content, TTL and cached content cleaning per the user's preference or desire. Dynamic cache makes generation of static pages by means of CMS unnecessary. Caching the output of servlets, commands, and JavaServer Pages (JSP) improves application performance. WebSphere Application Server consolidates several caching activities including servlets , Web services, and WebSphere commands into one service called the dynamic cache. These caching activities work together to improve application performance, and share many configuration parameters that are set in the dynamic cache service of an application server.
DYNAMIC CACHE MANAGEMENT TECHNIQUE
In recent years, power dissipation has become one of the major design concerns for the microprocessor industry. The shrinking device size and the large number of devices packed in a chip die coupled with large operating frequencies, have led to unacceptably high levels of power dissipation. The problem of wasted power caused by unnecessary activities in various parts of the CPU during code execution has traditionally been ignored in code optimization and architectural design. Higher frequencies and large transistor counts more than offset the lower voltages and the smaller the devices and they result in large power consumption in a newest version in a processor family.
This mechanism can provide instruction stream for the data path, and when managed properly, it effectively eliminates the need for extensive use of I-cache more expensive. The basic idea is that only the most played part of the code are stored in the L0 cache, because that is where the program spends most of his time. The experimental results show that over 60% of the energy dissipated in the subsystem I-cache can be saved. caches are accounting for an increasing proportion of transistors on a chip and the total energy dissipation.
The memory hierarchy of high performance and embedded processors are one of the major energy consumers. The article describes a new technique for introducing a new cache called the mini cache- the L0-cache which is located in between the level I cache and the processor. instruction stream can be provided to the data path if this is done properly. The utilization of the more expensive level-I cache can be reduced by this.
The propositions for the resizable cache designs are broadly classified into two types:
-Based on the cache organization: Here the cache’s set-associativity is varied by the selective-ways organization and the number of cache sets is varied by the selective-sets.
-Based on the resizing strategy: Here , on application statically sets the cache size prior to an application’s execution and also the dynamic resizing both across and within application is also done by the application.
POWER TRENDS FOR CURRENT MICROPROCESSORS
Very often the memory hierarchy access latencies dominate the execution time of the program, the very high utilization of the instruction memory hierarchy entails high energy demands on the on chip I-cache subsystem. In order to reduce the effective energy dissipation per instruction access, the addition of an extra cache is proposed, which serves as the primary cache of the processor, and is used to store the most frequently executed portion of the code.
Chapter 2 :WORKING WITH THE L0 CACHE
Some dynamic techniques are used to manage the L0-cache. The problem that the dynamic techniques seek to solve is how to select the basic blocks to be stored in the L0-cache while the program is being executed. If a block is selected, the CPU will access the L0-cache first; otherwise, it will go directly to the I-cache and bypass the L0-cache. In the case of an L0-cache miss, the CPU is directed to fetch instructions from the I-cache and to transfer the instructions from the I-cache to the L0- cache.
PIPELINE MICRO ARCHITECTURE
Figure 1 show s the processor pipeline we model in this research. The pipeline is typical
of embedded processors such as StrongARM.