MIL/NRG - The University of Tennessee
the university of tennessee machine intelligence lab & networking research group
 
Fabric on a Chip (FoC) [url]
At the heart of any switch or router resides a packet switching fabric subsystem. Coarsely speaking, the latter is in charge of facilitating the effective exchange of packets from the switch input ports to its output ports. Output-queued (OQ) switching architectures, in which arriving packets are predominantly buffered at the output ports, offer highly desirable properties including minimal average packet delay, controllable Quality of Service (QoS) provisioning, and work-conservation. However, as port densities and data rates increase, the inherent characteristics of OQ switches give rise to prohibitive memory requirements that have traditionally rendered infeasible the implementation of large port density OQ systems. Our work focuses on novel pipeline memory management architectures, which exploit the unique attributes of FPGAs in order to overcome these limitations. The approach allows for the future realization of a complete switching fabric on a single chip.
 
High-Performance Packet Switching Architectures [url]
Packet switching architectures and scheduling algorithms that offer high-performance for next-generation switches and routers have been the focus of academic and industry studies in recent years. During the next few years, complex scientific experiments are expected to generate several petabytes of data that will be transferred to geographically distributed terascale computing facilities. The evolution of large-scale private scientific networks will yield similar engineering challenges to those facing Internet architects. One of these key challenges is the development of scalable, high-performance switching fabrics that reliably facilitate the exchange of data between the network nodes. These platforms necessitate efficient scheduling and bandwidth management technologies that go well beyond legacy schemes. There is great interest in embracing the challenges of designing, analyzing and demonstrating scalable, high-performance packet-scheduling algorithms, which along with complementing switch architectures, will empower next-generation wireline and wireless networks.
 
Securing Wireless Sensor Networks [url]
As sensor networks become one of the key technologies to realize ubiquitous computing, promising to revolutionize our ability to sense and control the physical environment, security remains a growing concern. Although a wealth of key-generation methods have been developed during the past few decades, they cannot be directly applied to sensor network environments. The resource-constrained characteristics of sensor nodes, the ad-hoc nature of their deployment, and the vulnerability of wireless media, pose a need for unique solutions. A fundamental requisite for achieving security is the ability to provide for data confidentiality and sensor node authentication. Our effort focuses on the need for security solutions in WSN by introducing efficient public key cryptographic methodologies, explicitly designed to accommodate the distinctive attributes of resource-constrained sensor networks.
 
Scalable Reinforcement Learning Systems [url]
Reinforcement learning (RL) corresponds to a broad class of machine learning methods that allow a system to learn how to behave in a stochastic environment based on reward signals. Our research focuses on model-free learning systems with an emphasis on addressing large-scale problems that necessitate real-time functionality. To do so, we investigate schemes that lend themselves to high-speed custom hardware realization, inspired by the massively parallel nature of computations carried out in the brain.
 
Scheduling in Chip Multiprocessors (CMP) [url]
One of the main bottlenecks in the multi-core performance is the shared L2 cache. Typical cache access latency is typically in the order of 14 clock cycles. (7ns for a 2GHz core) with main memory latency in the order of 200 clock cycles (100 ns for a 2GHz core). For efficiency and performance multi-cores share the memory latency and bandwidth. Latency is the time taken to read the first word of the block from main memory and bandwidth is the time taken to retrieve the rest of the block. However the requests from multiple threads in each processor core should be very carefully synchronized to effectively optimize the sharing of the resources and minimizing inter-core interference (ICI). Main memory management is done by the operating system but the cache memory capacity and bandwidth are managed by hardware (cache controller, memory controller, etc.). So as the number of cores are increased the hardware controller (including firmware) should be designed to provide each thread with efficient share of the resources. The performance of the cache is improved by capturing the temporal and spatial locality by reducing memory access times. Historically, optimizing resource access has been very elegantly dealt in switching networks. The bandwidth and capacity is shared across multiple requestors optimizing architectures around one or many performance metrics like latency, bandwidth, and capacity. Particularly concepts from high-speed switching fabrics can be used in multi-core processors as the number of cores extend to 16 or higher.
 
Intelligent Transportation Systems [url]
Efficiently controlling the flow of traffic across complex transportation networks remains an intricate challenge. This is particular true when safety is considered, in addition to conventional performance goals such as minimizing delay and maximizing throughput. We develop, analyze and evaluate novel traffic signaling algorithms that offer high performance while remain computationally modest. To do so, we employ control-theoretic techniques for which rigorous performance bounds can be established.
 
Hardware Acceleration Platforms for Power Systems Analysis [url]
Power grid analysis, spanning both state (load flow) estimation and contingency analysis, is a computationally heavy process. One mainstream approach to accelerate this process is by use of supercomputing platforms. Although the latter offer substantial speed up gain when compared to desktop workstations, the price/performance ratio is somewhat high. An alternative approach, taken by our research group, is to employ Field Programmable Gate Array (FPGA) devices as custom-logic based acceleration platforms. To that end, our research focuses on distributed hardware architectures that exploit parallelism and concurrency to achieve substantial computation speed up gains in a cost-efficient FPGA-based framework.