3GPP Release 14 has specified the architecture for split control and user plane for the mobile packet core enabling the separation of the user plane from control plane for the following network elements
It begs the following questions
Before we explore the answers for these, lets have a look at the requirements of a 3GPP user plane node
All of the above requirements need to be supported for atleast a few million flows (today's state of the art EPC gateways support few million bearers). In order to hold the bearer binding, traffic classification, timer, volume thresholds at per flow level for millions of flows, the user plane entity should have enough memory. But for line rate forwarding the memory access should not be a bottleneck. Hence if we need to achieve line rate forwarding with white box silicon based forwarders, they should have huge amount of (atleast 2 to 4 GB) on chip SRAM or TCAM to hold the flow classification and bearer binding rules. However on chip SRAM and TCAM in such high capacity will increase the cost of silicon so high, making such customized high capacity silicon just for the sake of mobile packet core, doesnt justify the one of the fundamental premise for control / user separation (i.e) ability to scale the data plane with cheap forwarders. A single TCAM that supports few million entry lookup space will mean it has that many million addressable locations. During TCAM lookup all these locations / cells will be charged and all but one will discharge and this will consume power. Note that a TCAM stores contents and comparators are used to locate stored content based on a given key. So the more the storage, the more the number of cells charged and discharged and hence more the power. Hence beyond a point adding too many cells on a single chip will heat up the chip to an extent that it burns. Refer here.
However chaining of multiple TCAMs with each looking up few 100 thousand space is possible. But this will increase the cost proportionately.
So, in order to meet the cost, power efficiency and the scale, one has to run the data plane on x86 with forwarding logic in the user space doing a kernel bypass through techniques like DPDK. Memory can be through DRAM, but DRAM access will become the bottleneck for line rate processing. Hence the data structures in DRAM need to be carefully placed to utilize the cache hierarchy - such that frequently used flow rules are placed together in RAM and hence they get a chance to be in the cache most of the time.
The hardware architecture for the user plane of mobile packet core requires x86 with enough DRAM. Custom white box switch silicon / silicon supporting PISA (Protocol Independent Switch Architecture) cannot directly support all the requirements of a EPC forwarder, atleast as of now.
- SGW (SGW-C and SGW-U)
- PGW (PGW-C and PGW-U) and
- TDF (TDF-C and TDF-U)
It begs the following questions
- What could be the potential architecture for these user plane nodes?
- Can white box switch silicon be directly used as 3GPP user plane elements?
Before we explore the answers for these, lets have a look at the requirements of a 3GPP user plane node
For PGW-U and/or TDF-U:
- On the downlink direction, ability to classify a packet and do bearer binding (i.e encapsulate the packet into the right GTPU tunnel).
- Ability to detect idle flows and report them to the C plane so that the C plane can take actions like terminating the flow / bearer / PDN.
- Counting the uplink and downlink volume of traffic per flow.
- Counting the uplink and downlink usage time (i.e count only the time for which the flow is actively involved in forwarding, ignorig the time for which the flow was idle).
- Support aggregate bit rate enforcement on a group of flows (i.e ability to define meters and associate that meter with multiple flows).
- Application specific packet classification (optional) - for Application Detection and Control (ADC). This means the user plane supports deep packet inspection for a variety of applications and also supports taking configuration of application ID to its L4-L7 packet filter for matching/classifying the incoming traffic as the application.
- Ability to configure the packet marking policy per traffic steering policy identifier and the ability to associate a flow against a traffic steering policy.
- Ability to report to the C plane at a per flow level on set volume and time thresholds. In summary, the user plane should count per flow traffic volume and should also support running timers per flow.
For SGW-U:
- Ability to buffer packets when the bearer is in IDLE state (due to UE being in EMM-IDLE).
- Ability to modify the GTPU tunnel header and switch the tunnel from ingress side to the egress
All of the above requirements need to be supported for atleast a few million flows (today's state of the art EPC gateways support few million bearers). In order to hold the bearer binding, traffic classification, timer, volume thresholds at per flow level for millions of flows, the user plane entity should have enough memory. But for line rate forwarding the memory access should not be a bottleneck. Hence if we need to achieve line rate forwarding with white box silicon based forwarders, they should have huge amount of (atleast 2 to 4 GB) on chip SRAM or TCAM to hold the flow classification and bearer binding rules. However on chip SRAM and TCAM in such high capacity will increase the cost of silicon so high, making such customized high capacity silicon just for the sake of mobile packet core, doesnt justify the one of the fundamental premise for control / user separation (i.e) ability to scale the data plane with cheap forwarders. A single TCAM that supports few million entry lookup space will mean it has that many million addressable locations. During TCAM lookup all these locations / cells will be charged and all but one will discharge and this will consume power. Note that a TCAM stores contents and comparators are used to locate stored content based on a given key. So the more the storage, the more the number of cells charged and discharged and hence more the power. Hence beyond a point adding too many cells on a single chip will heat up the chip to an extent that it burns. Refer here.
- Quote from the paper - "One of the major issues in TCAM, or in gen- eral any CAM, is their very high dynamic power consumption. The main reason behind this issue is the fully parallel nature of search operation. This fully parallel search operation causes all the match lines in a TCAM block to charge in their precharge phase and allows all but one match line to discharge during their evaluation phase. The one match line which does not discharge during evaluation phase indicates the match in the search oper- at ion.”
However chaining of multiple TCAMs with each looking up few 100 thousand space is possible. But this will increase the cost proportionately.
So, in order to meet the cost, power efficiency and the scale, one has to run the data plane on x86 with forwarding logic in the user space doing a kernel bypass through techniques like DPDK. Memory can be through DRAM, but DRAM access will become the bottleneck for line rate processing. Hence the data structures in DRAM need to be carefully placed to utilize the cache hierarchy - such that frequently used flow rules are placed together in RAM and hence they get a chance to be in the cache most of the time.
In summary:
The hardware architecture for the user plane of mobile packet core requires x86 with enough DRAM. Custom white box switch silicon / silicon supporting PISA (Protocol Independent Switch Architecture) cannot directly support all the requirements of a EPC forwarder, atleast as of now.