TruCore: Truminds 5G UPF now runs in 6WINDGate as a plugin
Apr 08, 2021
In this blog, I am going to tell you about how we went about porting the Truminds UPF VPP plugin to run it in 6WINDGate as a plugin.
UPF stands for the User Plane Function. It is a 5G network function which routes traffic from mobile side to internet and vice versa. You can imagine that a UPF must handle packets from tens and thousands of mobiles if not more, so it is operating at scale and must be highly efficient in its processing. The following picture tell you about the system context.
To begin with, Truminds already had the UPF codebase which runs like a plugin inside VPP (Vector Packet Processing — https://fd.io/) This means that UPF codebase compiles to produce a shared library ( a .so file). This shared library file is then dropped into VPP and then VPP is started up. All works well and everybody is happy.
Recently one of our customers said that they are not using VPP as their base framework for routing and switching but have a commercial router framework called the 6WINDGate ( https://www.6wind.com ) and whether it was possible for us to run the 5G UPF as a plugin inside 6WINDGate. So that’s how it all began ! This blog is all about some of the challenges we faced and techniques we utilized to accomplish the task. I must warn upfront that if you are not familiar with VPP, it might turn out to be a little bit of heavy reading, but give it a shot since you might end up learning something about both VPP and 6WINDGate if you decide to read on nevertheless.
First of all, let’s see how the software is really organized on the VPP side. In VPP typically the entire software runs on N cores on a machine as a single linux process in userspace. Out of these N cores, the core 0 usually runs the slowpath thread and the rest of the N-1 cores run the fastpath threads. The slowpath thread has a typical linux application programming paradigm eg. you use sockets and then fall into a select/epoll loop to monitor those sockets and so forth. The fastpath threads, on the other hand, run into an affinized fashion on N-1 cores and each of these threads run a busy loop trying to get packets from NIC directly int the userspace (via VPP’s DPDK Plugin) and then process those packets and possibly send packets out too directly to the NIC. When UPF runs as a plugin in VPP, it uses socket on the slowpath thread to communicate with the Control Plane (SMF) for call establishment. The actual packet processing is done in the fastpath threads where VPP provides the IP packets to the UPF Plugin over a so called ‘feature arc’ where UPF registers at the startup. Effectively UPF sets itself up in the default IP packet processing pipeline of VPP when it starts as a plugin in VPP and starts getting the IP packets which VPP lifts from the NIC in the fastpath thread. UPF’s business logic code then processes these IP packets and uses the VPP API’s to send those packets out.
The summary picture of the above looks like the following —
A couple of additional important points about the picture. First, The UPF Plugin does not know anything about DPDK really, that is abstracted out by the VPP. It is shown in the picture merely to emphasize that packets come into the fastpath threads directly from NIC into userspace with DPDK (used by VPP) and therefore these packets ‘bypass the kernel’
A couple of additional important points about the picture. First, The UPF Plugin does not know anything about DPDK really, that is abstracted out by the VPP. It is shown in the picture merely to emphasize that packets come into the fastpath threads directly from NIC into userspace with DPDK (used by VPP) and therefore these packets ‘bypass the kernel’
Second, if the packets bypass the kernel, then how does the slowpath thread which is using sockets (and therefore an interface with the kernel) get the relevant packets ? This is done by the TAP interface which is a standard virtual ethernet device in linux. So when fastpath gets a packet and realizes that it needs to be handled by the slowpath via the linux kernel, the fastpath simply injects the packet back into the kernel using the TAP device and then the packet travels in the kernel the normal way and is received by the slowpath thread over the socket. The reverse is also true. When the application writes the packet to a socket, linux ensure via proper routing that it goes out via the TAP, then it is received by one of threads of UPF listening on the TAP file descriptor and UPF then writes the packet on the network using the VPP API’s. This TAP therefore acts like a bridge between the fastpath and slowpath. The TAP interface is important because when VPP’s DPDK plugin comes into operation for packet I/O with the NIC, the corresponding linux interface for the NIC is no longer in control of linux. If you want to learn about TAP’s in more detail, here is a good reference — https://backreference.org/2010/03/26/tuntap-interface-tutorial/
So that is some heavy weight architecture the way things are organized on VPP. If we wanted to port this on 6WINDGate, it was naturally important to understand how the fastpath and slowpath are organized on the 6WINDGate and how the packet input output is done there. The way the things are organized on 6WINDGate are somewhat similar but there are subtle differences.
The similar part is that there are again fastpath threads in 6WINDGate. There are a set of API’s by which the packet input and output can be done by a software plugin by coming into the default IP pipeline of 6WINDGate fastpath. The plugin itself is also a shared library. So far so good. The first striking difference was that in 6WINDGate, even though the I/O happens via DPDK, the corresponding linux interface does not vanish. 6WINDGate automatically creates a virtual linux interface corresponding to the linux interface taken over by DPDK. So what this means is that the TAP is no longer needed ! We already have a sort of readymade TAP available from 6Wind. This sounded very good, because it is certainly easier to remove code during porting than adding it. Instead of the TAP, UPF could simply use this virtual linux interface created by 6WINDGate.
What about the Slowpath thread though ? Well it turns out that when we run the 6WINDGate, only the fastpath threads are spawned. The slowpath thread is not there at all. One can write a normal linux application process which can do I/O on socket. But this was a problem for UPF because if the fastpath and slowpath run as different linux processes then the Shared Data between slowpath and fastpath, which was really global variables, and thus could be shared in VPP usecase (because slowpath and fastpath were threads of a single linux process), could now no longer be shared trivially if the slowpath and fastpath were to become different linux processes. It is certainly possible but would prove to be a difficult porting exercise. Something had to be done about it to minimize changes to code during porting.
What we decided to do was that during the 6WINDGate fastpath startup, when the fastpath threads were getting globally initialized, we used an init hook of the UPF Plugin to create one more thread which was used as the slowpath thread. Since this slowpath thread was part of the fastpath process having the fastpath threads, so the problem regarding the global variables went away and we were quite happy that we could save a lot of effort.
But then just as we were beginning to celebrate the reduced efforts in porting, we realized that the entire fastpath process of 6WINDGate operates in a different VRF. VRF is an isolated networking stack in which a process operates so that it can use its own IP addresses and routes etc. The newly spawned slowpath thread from the fastpath process thus belongs to this same VRF. Thus when the spawned slowpath thread from fastpath tried to open and bind sockets on the IP address of the linux interface (created by 6WINDGate), the bind was failing. This was because the linux interface was part of a different VRF (the default VRF) instead of the VRF in which the fastpath was operating. Woops, if the slowpath cannot even open up a socket on the interface then how would the communication take place with the Control Plane for the call setup ! So the 6WINDGate team provided excellent support here. They advised that there was an API using which the fastpath code can temporarily switch to the default VRF and then switch back to the current VRF in which it normally operates. So we used these API’s in the spawned slowpath thread from the fastpath. Basically when the slowpath thread was opening up its sockets we switched the VRF’s and after that we switched back to the fastpath VRF. Now the slowpath was able to open up sockets and we could communicate with the Control Plane. So this was one of the useful tricks.
The next challenge was to figure out how to make the UPF sit inside the IP Pipeline of 6WINDGate. While in VPP there is a concept of a ‘feature arc’, we learnt that in 6WINDGate, the concept is that you register for a ‘hook’ function in the IP Pipeline. This kind of mapped well and we could port the placement of UPF in the IP Pipeline with relative ease.
What about the packet input output though ? In VPP we have the vlib frames which contain the vlib buffers. We figured in 6WindGate there is a similar concept called a ‘bulk’ which maps to vlib frames of VPP. Likewise for vlib buffers there were mbuf’s in 6WindGATE. So all the reference of vlib frames and vlib buffers were converted to the 6WindGATE bulk and mbuf respectively. This was a slightly laborious task but was relatively straightforward. Of course there were several functions which were operating on these data structures so all those had to be ported as well. Eg. if there was a function to concatenate two vlib buffers, that had to be ported to now concatenate two mbuf’s in 6WINDGate usecase. All this had to done with a surgical precision.
The last challenge we had to deal with was inter-core switching. UPF has a concept of PFCP PDU Sessions and each session is processed in a single fixed core for so that locks can be avoided. This means that when a packet enters into a fastpath thread, first the session for that packet is located based on various fields in the packet. The session tells which CPU core must handle that packet further. The packet is then switched accordingly to the relevant CPU core where the processing then continues as usual till completion. The intercore switching API’s were naturally different in 6WINDGate than in VPP, but fortunately 6WINDGate provided some good examples of how the intercore switching is done there and we could use those API’s while porting to achieve a similar behaviour there
So there we are ! Truminds now has 5G UPF which runs as a plugin in 6WindGATE too apart from running as a plugin in VPP, and natively as a DPDK or pure linux sockets based application.
Bottomline is that if you have a well written plugin in VPP, it can certainly be ported to 6WindGATE. If you ever run into a usecase like this, feel free to get in touch with Truminds and we can help you do rapid work there. Likewise, if you do need a ready-made UPF plugin for 6WINDGate then we’ve got your back there, just give us a call.
That’s all for this time folks, keep tuned for more !