A Survey of Packet Classification Tools and Techniques综述
作者:Anand Prem Kumar V, Vidya Thiyagarajan, 2N. Ramasubramanian
时间:2015 International Conference on Computing Communication Control and Automation
单位:Department of Computer Science and Engineering National Institute of Technology
# 1. INTRODUCTION
Network processors are evolving to
meet the increased bandwidth demands placed on computer networks.
网络处理器正在不断发展,以满足对计算机网络增加的带宽需求。
Packet classification is one of the significant tasks
performed by Network processors in this regard
数据包分类是网络处理器在这方面执行的重要任务之一。
Implementation of classification algorithms
based on clustering, bit vectors and trees on different hardware platforms are studied and performance is compared in terms of
scalability, throughput and memory utilization
翻译:中文多用主动。研究 了基于聚类、位向量和树的 分类算法 在不同硬件平台上的 实现 ,并在可扩展性、吞吐量和内存利用率方面进行了性能比较。
With increase in
network speed, the demand for xxx is also increasing significantly
随着 网络速度的提高,对 xxx 的需求也在显著增加。
In order to meet this requirement the packet classification algorithms are continuously evolving
.
为了满足这一要求,数据包分类算法在 不断发展 。
Packet classification is primarily
based on matching header fields of a packet with an appropriate rule-set.
数据包分类 主要 是基于数据包的头字段与适当的规则集的匹配。
the packets are initially stored in receive buffer and are fed to
micro-engine
数据包最初存储在接收缓冲器中,并被 送入 微型引擎。
Micro engine is solely
responsible for packet processing
微型引擎只负责数据包的处理
It gets the rules from RAM and does the searching using TCAM
它从RAM中获取规则并使用TCAM进行搜索
If the packet is matched with
some of the rules in the rule-set then it is transferred to
transmit buffer, else it is discarded
如果数据包 与 规则集中的一些规则相 匹配 ,那么它就被 转移到 发送缓冲区,否则就被丢弃。
In some of the cases like Software Defined Networking, the matching is done on the basis of
15 fields
在某些情况下,如软件定义网络,匹配是 在 15个字段的 基础上 进行的。
TCAM
is a good option when high throughput is required. It provides high throughput but suffers from high power consumption and cost. SRAM
is better in terms of throughput and power consumption.
当需要高吞吐量时,TCAM是一个不错的选择。它提供了高吞吐量,但有高功耗和高成本的缺点。SRAM在吞吐量和功耗方面更好。
As per
the specified requirements either SRAM or TCAM are combined with Multi-core and FPGAs to satisfy the constraints
根据 指定的要求,SRAM或TCAM与多核和FPGA相结合,以满足限制条件。
笔记
as per 依照、依据
The study provides the current state-of-art scenario
in packet classification algorithmic techniques along with the platform on which they are implemented.
该研究提供了当前数据包分类算法技术的最新情况,以及它们的实施平台。
# 2. CHALLENGES IN PACKET CLASSIFICA TION
The growth in computer networks in terms of size and bandwidth demands is leading to new challenges in packet classification. Continuous evolution of hardware and algorithmic techniques are trying to fulfill the upcoming needs. Some of the major issues have been discussed below.
计算机网络在规模和带宽需求方面的增长,导致了数据包分类方面的新挑战。硬件和算法技术的不断发展正试图满足即将到来的需求。下面讨论了一些主要问题。
# A. Classification Speed
To improve Quality of Service (QoS), high packet classification speed is required. The combination of various hardware and algorithmic techniques has given a wide range of classification speed. It varies from 1 GB/s [9] to 1 Tb /s [10]. There are various other factors like cost and power consumption that need to be considered in association with speed. So achieving highest throughput is not always the most suitable approach for packet classification.
为了提高服务质量(QoS),需要高的数据包分类速度。各种硬件和算法技术的结合给出了一个广泛的分类速度。它从1 GB/s [9] 到1 Tb /s [10] 不等。还有其他各种因素,如成本和功耗,需要与速度联系起来考虑。因此,实现最高的吞吐量并不总是最适合于数据包分类的方法。
# B. Storage Space
The size of memory to be used is generally dependent on set of classification rules. In 99% rule-sets, the number of rules is more than 500. There are only 0.7% cases where numbers of rules are more than 1000 [11]. If there is a requirement of small memory then a high speed and costly memory like TCAM can be utilized
要使用的内存大小一般取决于分类规则集。在99%的规则集中,规则的数量超过了500条。只有0.7%的情况下规则数量超过1000条[11]。如果有小内存的要求,那么可以利用高速和昂贵的内存,如TCAM。
# C. Power Consumption
TCAM memory is very dense and gives high throughput in comparison to SRAM. But it is 100 times more power consuming [12]. So TCAM architecture is optimized using multi level hierarchy and range comparison circuit. In another approach SRAM architecture is optimized by doing parallel comparison [8] in order to meet throughput along with low power consumption. Jiang and Prasanna [7] have also tried to optimize SRAM usage using multi-pipeline architecture to get high throughput at low power.
TCAM存储器非常密集,与SRAM相比,它的吞吐量很高。但是它的耗电量是原来的100倍[12]。因此,TCAM架构使用多级层次结构和范围比较电路进行了优化。在另一种方法中,SRAM结构通过平行比较进行了优化[8],以满足吞吐量和低能耗的要求。Jiang和Prasanna[7]也试图使用多管道结构来优化SRAM的使用,以便在低功率下获得高吞吐量。
# D. Scalability
The fields of the header of a packet vary from 5 [10] [11] [9] to 15 [5]. This leads to varying rule-sets per packet. So algorithmic techniques used in network processors need to be highly scalable. In some approaches the data structures have to be modified with the change in rule-sets [11] [9].According to the fields, pre-processing on packets is done and is stored in some initial data structure like range-tree [13] [14] or hash table [15]. In order to get high throughput, depending on the complexity of rule sets linear search or binary search is chosen.
一个数据包头的字段从5 [10] [11] [9] 到15 [5]不等。这导致了每个数据包的规则集的不同。因此,网络处理器中使用的算法技术需要具有高度的可扩展性。在一些方法中,数据结构必须随着规则集的变化而修改[11] [9]。根据字段大小,对数据包进行预处理,并存储在一些初始数据结构中,如范围树[13] [14]或哈希表[15]。为了获得高吞吐量,根据规则集的复杂性选择线性搜索或二进制搜索。
# 3. ALGORITHMIC TECHNIQUES FOR PACKET CLASSIFICATION
With increase in network service, the need for security and QoS is rapidly growing. To meet these requirements many classification algorithms have been proposed. Decomposition and decision tree are the two widely used techniques for this purpose. The approaches used in these algorithms are discussed below
随着网络服务的增加,对安全和QoS的需求正在迅速增长。为了满足这些要求,许多分类算法已经被提出。分解和决策树 是为此目的而广泛使用的两种技术。这些算法中使用的方法讨论如下
# A. Clustering based algorithm
In Rule point Conversion the rules are clustered based on some heuristics like minimum distance, maximum distance and distance from origin in a multidimensional space where rule is represented as point. Fong et al. [10] have merged
this technique with
simulated annealing to resize cluster in such a way that
each cluster has similar size. In simulated annealing, rules are randomly classified in the first step and initial cost of system is calculated. After this step, swapping of rules is done.
If the new system has a lower cost, then this cluster is finalized.Ahmed and Areibi [9] have also used clustering technique for making of rule sets. In this approach the rules are plotted on a 2 dimensional graph where X-axis is based on IP address of source and Y -axis is based on IP address of destination.
Since a large number of the rules have overlapping region so classification tends to be challenging. Due to this reason higher bits are combined to form a more uniform cluster to make searching simple
在规则点转换中,规则根据一些启发式的方法进行聚类,如最小距离、最大距离和在一个多维空间中与原点的距离,其中规则被表示为点。Fong等人[10]将这一技术与模拟退火法 合并,以这种方式 调整集群的大小,使 每个集群具有相似的大小。在模拟退火法中,规则在第一步被随机分类,系统的初始成本被计算出来。在这一步之后,对规则进行交换。
如果新系统的成本较低,那么这个集群就被最终确定。
Ahmed和Areibi[9]也使用聚类技术来制作规则集。在这种方法中,规则被绘制在一个二维图形上,其中X轴是基于源的IP地址,Y轴是基于目的地的IP地址。
由于大量的规则有重叠的区域,所以分类往往是具有挑战性的。由于这个原因,较高的比特被组合起来 ,形成一个更统一的集群,使搜索变得简单。
笔记
cluster:
N-COUNT 可数名词(人或物的)组,群,簇
# B. Bit Vector Classification
For maintaining linearity of rules Jing and Prasanna [16] have used field-split bit vectors. Here single number value in port are represented as binary and dont care values. Packets are matched against these values, but rules are not combined for making clusters due to which linearity is maintained.
为了保持规则的线性,Jing和Prasanna[16]使用了字段分割的位向量。在这里,端口中的单个数字值被表示为二进制,并且不关心值。数据包与这些值相匹配,但是规则并没有被组合成集群,由于这一点,线性得以保持。
# C. Tree based Classification
For making packet classification scalable
Jiang et al. [17] have used decision trees. The rules are inserted into the tree based on the header fields. In this approach the replication is reduced by making a new internal node for that property and the rules which share that property are the child on that node which makes it more scalable. This tree can be implemented on parallel pipeline for increasing efficiency. Song and Lockwood [18] have merged bit vectors with ternary content addressable memory (TCAM) and trees. Here also fields are based on header of packets. Based on the values in vector, the packets having common header and protocols fields are sent as a single entity to TCAM. For supporting incremental updates port prefix lookup tree is built.
为了使数据包分类具有可扩展性,Jiang等人[17]使用了决策树。这些规则是根据头字段插入树中的。在这种方法中,通过为该属性建立一个新的内部节点来减少复制,共享该属性的规则是该节点上的子节点,这使得它更具可扩展性。这种树可以在并行管道上实现,以提高效率。Song和Lockwood[18]将位向量与三元内容可寻址存储器(TCAM)和树合并。这里的字段也是基于数据包的头。基于向量中的值,具有共同头和协议字段的数据包被作为一个实体发送到TCAM。为了支持增量更新,建立了端口前缀查询树。
# 4. HARDWARE TECHNIQUES FOR PACKET CLASSIFICATION
Hardware improves the performance of algorithmic techniques by providing tailored
resources in the process. These resources are responsible for reducing the access time of the data in the classification techniques .Some of the majorly used hardware techniques are discussed below
硬件通过在这个过程中提供有 针对性的 资源来提高算法技术的性能。这些资源负责减少分类技术中的数据访问时间。一些主要使用的硬件技术讨论如下
笔记
tailored
: ADJ-GRADED 能被表示程度的副词或介词词组修饰的形容词(衣服) 紧身的,合身的 Tailored clothes are designed to fit close to the body, rather than being loose.
# A. TCAM
TCAM is an efficient memory which performs searching in single cycle. Due to this it is a good option in packet matching and classification. Yu and Katz [19] have proposed multi-match classification with single lookup. It is observed to be ten times efficient than
the pure software approach. But this approach is very costly and has high power consumption.
In order to remove these disadvantages Mishra and Sahni [20] have given dual TCAM architecture. Considering the number of movements for each update, it is a better option. Zheng et al [21] have focused on bandwidth limitation and large expansion ratio problem of TCAM. To mitigate
this, they have implemented distributed parallel packet classification scheme on TCAM. It has been incorporated
in Catalyst 6500 [22] for improving scalability where bigger tables are used for packet classification
TCAM是一种高效的存储器,可以在单周期内进行搜索。因此,它是数据包匹配和分类的一个好选择。Yu和Katz[19]提出了单次查找的多匹配分类。据观察,它的效率是纯软件方法的十倍 。但是这种方法的成本很高,而且耗电量也很大。
为了消除这些缺点,Mishra和Sahni[20]已经给出了双TCAM结构。考虑到每次更新的运动数量,这是一个更好的选择。Zheng等人[21]关注了TCAM的带宽限制和大扩展率问题。为了 缓解 这个问题,他们已经在TCAM上实现了分布式并行数据包分类方案。它已被纳入Catalyst 6500[22],以提高可扩展性,其中更大的表被用于数据包的分类。
# B. FPGA based implementation
Attig and Brebner [23] have implemented high level program for packet parsing
over
FPGA in order to achieve a rate of
400Gb/s. In this process the compilation
generates a virtual architecture which is specific to
packet classification.
To solve the problem of high memory requirement in packet classification, Jiang and Prasanna [24] have implemented coarse-grained
set of algorithms on partitioning-based
parallel architecture realized on FPGA. For the same purpose, Qi et al.
[25] have used pipeline based solution on FPGA using Hyper split algorithm. Wanger et al. [6] have used similar architecture using balancing memory distribution. In the latest approach by Jiang and Prasanna [17], FPGA are exploited for
parallelism using multi-pipelining. Decision tree based algorithm are used for 12 tuple rules in this approach.
Attig和Brebner[23]在FPGA上实现了 包解析 的高级程序,以达到400Gb/s的速率。在这个过程中,编译 产生了一个 专门 用于数据包分类的虚拟架构。
为了解决数据包分类中的高内存需求问题,Jiang和Prasanna[24]在FPGA上实现了 基于分区 的并行架构的 粗粒度算法集 。出于同样的目的,Qi等人[25]采用了基于流水线的算法。
[25]在FPGA上使用了基于流水线的解决方案,使用了超分割算法。Wanger等人[6]使用了类似的架构,使用平衡内存分布。在Jiang和Prasanna[17]的最新方法中,FPGA使用多管线来实现并行化。在这个方法中,基于决策树的算法被用于12个元组规则。
# C. SRAM based implementation
SRAM is a promising hardware for packet classification due to the capability of giving high throughput at lower power consumption. V aish et al [26]. have used decision tree based algorithm for implementation with the help of SRAM to exploit these features. In their approach SRAM works as a black box which abstracts computational complexity.
Apart from FPGA Jiang and Prasanna [27] have looked into SRAM for efficient implementation of packet classification algorithm. They have proposed a hybrid of cutting and merging algorithm, which is implemented with the help of SRAM based technology for multi-pipelining to achieve high throughput.
Xing et al. [8] have come up with similar approach where they have checked the results in terms of power consumption apart from throughput. Jiang and Prasanna [7] also later on came up with low power consuming packet classification. They have implemented their decision tree based algorithm for multipipelining on a SRAM to have fixed number of memory access
由于能够以较低的功耗获得较高的吞吐量,SRAM是一种很有前途的数据包分类硬件。V aish等人[26]在SRAM的帮助下,使用了基于决策树的算法来实现这些特性。在他们的方法中,SRAM作为一个黑盒子工作,抽象了计算的复杂性。
除了FPGA之外,Jiang和Prasanna[27]还研究了SRAM以有效实现数据包分类算法。他们提出了一种切割和合并的混合算法,在基于SRAM技术的帮助下实现了多管齐下,以达到高吞吐量。
Xing等人[8]提出了类似的方法,他们在这里检查了除吞吐量以外的功耗结果。Jiang和Prasanna[7]后来也提出了低耗电的数据包分类。他们在SRAM上实现了基于决策树的多颗粒算法,以获得固定的内存访问数量。
# D. Packet Classification on Multi core
Qi et al. [28] have explored the Intel multi-core network processor for implementing hierarchical space aggregation based packet classification. Since the algorithm is specifically optimized for multi-core platform, there is no burst of memory usage and searching process is also fast. It is observed to provide the best search time considering memory usage and classification speed. In another approach [29], the flexibility of Aggre Cut algorithm in terms of data structure is explored for making it specific to Intel IXP2850 32-bit and Cavium OCTEON3860 processor. It further improves the previous approach in terms of classification speed
Qi等人[28]探索了英特尔多核网络处理器来实现基于分层空间聚合的数据包分类。由于该算法是专门为多核平台优化的,所以没有内存的突发使用,搜索过程也很快速。据观察,考虑到内存使用和分类速度,它提供了最佳的搜索时间。在另一种方法[29]中,探索了Aggre Cut算法在数据结构方面的灵活性,使其适用于Intel IXP2850 32位和Cavium OCTEON3860处理器。它在分类速度方面进一步改进了以前的方法
# 5. 在可扩展性和吞吐量方面,对各种硬件和算法的性能进行比较。
The work presented here
discusses important hardware architectures on which various algorithmic techniques have been implemented in order to achieve maximum throughput with efficient scalability. Most of the older proposals match minimum 5 field headers on FGPA and multi-core platforms.
Their throughput is in the range of
1 to 50 Gbps. Due to requirement of high throughput TCAM based architectures have been proposed. Because of necessity of high throughput.
SRAM provides low setup cost and higher throughput also.
Therefore in some approaches hybrid architectures have been used. These are further implemented with techniques like multi-pipelining and layered architecture. Table I describes these approaches and their performance. The continuous improvement in the field of packet classification has made it possible to achieve up to 1 Tbps throughput, with the ability to match 15 header fields with rule-sets. FPGA based techniques are approximately 100 times better then multi-core in terms of classification speed.
这里介绍的工作 讨论了重要的硬件架构,各种算法技术都是在这些架构上实现的,以实现最大的吞吐量和高效的可扩展性。大多数较早的建议在FGPA和多核平台上匹配最小5个字段头。
他们的吞吐量在1到50Gbps的 范围内 。由于高吞吐量的要求,基于TCAM的架构已经被提出。由于高吞吐量的需要。
SRAM提供低设置成本和更高的吞吐量。
因此,在一些方法中使用了混合架构。这些都是通过多管道和分层结构等技术进一步实现的。表一描述了这些方法和它们的性能。数据包分类领域的不断改进使其有可能实现高达1 Tbps的吞吐量,并有能力用规则集匹配15个头域。基于FPGA的技术在分类速度方面大约比多核好100倍。