Design Document: Security Groups
Introduction
The goal of this design document is to outline the implementation of security groups. Security groups will contain a set of rules that specify allowed ports and protocols on the driver interface. This will be managed using nftables and initially support Linux devices only.
Phased Implementation
Phase I - Default Security Group for Organization
This phase intends to work out the end-to-end implementation of policy from the API down to each device. To avoid making it harder than necessary, the first phase will only allow a single security group per organization that is managed by the organization owner.
- The default security group for an organization will be applied to all devices on startup.
- A device can only have one policy group applied at any given time in phase I. This means each organization will have a security group. Only the organization owner can CRUD the security group for a given organization. All users in the organization will have read access to this security group.
- When a new set of rules are applied, clear the appropriate chain and re-apply the rules. This can be more elegantly managed but introduces a great deal of complexity when rules are overlapping and may not be installed in the tables because another rule superseded it. See the section Rule Deconfliction.
Phase II - User-owned Security Groups
This phase adds user-defined security groups. When a user has applied a security group to a device
- Users can create security groups for more granular policies per device. The user owns the security group. This would be similar to how an ec2 security group gets applied to a node would be the same here with a patch.
- A device can only have one policy group applied at any given time. This means that a device will either have a user-defined policy OR the default security group for the organization applied.
- Only the user that owns the policy should have CRUD access to it.
- All members of an organization should have read access to all policies applied to any device in the organization.
Phase III - Support for Multiple Security Groups
Prior phases only allowed a single security group policy per device. This phase would extend it to allow multiple policies.
- A device has the default policy for the organization applied by default.
- Instead of a device having a single policy, it is now a list. Once one or more user-defined policies are applied, that list is instead of the default security group, not in addition to it.
- Additional user-defined policies can be applied to a device. Their access should be the same as in phase II.
- The resulting behavior is that the allowed traffic is the sum of all policies as if they were combined into one, so combining rules into the same chains as before should be sufficient.
Future Enhancement Ideas
This section includes ideas for future enhancements that need further exploration to define and determine if and how we would like to add them.
Robust Admin Organization Policies
- Overarching admin policies can be overlaid on top or in lieu of individual user policies. Further exploration needs to happen here.
- Admin policy should be able to override user policies, meaning an organization admin may want to be able to restrict which type of traffic can be opened by a user. This is very different than the policy features from the previous phases.
- Once more detail is available on the types of policy we would like to allow here, it can be further defined.
Symbolic Names
It would be convenient to be able to refer to devices with symbolic names instead of only IP ranges. This could potentially based on labels or tags applied to devices, for example.
Default Security Group and Rules
Inbound rules: Ultimately, the default security group rules for inbound traffic by default drop all inbound traffic unless there is a match of traffic in an established state. This match is referring to traffic that is part of an existing connection initiated by the device. In Phase I all inbound traffic will be allowed until a user-friendly mechanism to install rules in the UI or CLI is complete.
Outbound rules: The agent will add a deny rule at the end of the egress chain only when an explicit allow rule is provisioned by the user.
- The default for Phase I of security groups is to permit any traffic in both directions. There will be one nftables table named
nexodus
containing two chainsnexodus-inbound
andnexodus-outbound
. While these chains could be completely empty by default, I would propose the inbound chain have some basic permit-any rules accompanied by a drop-all rule. This is primarily to give some burn in time on any potential issues along with getting accustomed to defining a default policy since the explicit allow will eventually become an implicit deny-by-default rule on inbound traffic only if we follow the ec2 style model. The egress table will allow all traffic by default with an implicit allow-all, meaning an accept chain with no rules. If the user defines a policy blocking some protocol, destination address or destination ports those allow rules would be added, followed by a drop rule. - At this time, our security group rules have no order associated with them. This is possible since there are no denies. As a reference, you can compare EC2 rules to Azure rules for not allowing deny statements vs allowing deny statements. The order begins to matter when deny rules are in place. This adds complexity which for our use case does not add any clear value.
- Users can add ranges of a given field. For example both, IpRanges with a value of
100.100.0.100-100.100.0.120
is valid and a prefix such as100.100.0.128/25
is also valid. Along with that, a single address such as100.100.0.10
. - The same applies to source port and destination ports,
PortFrom:8080
coupled withPortTo:9000
would equate to a rule of8080-9000
being permitted.PortFrom:0 PortTo:0
will be read asip permit <protocol> any
.PortFrom:443 PortTo:443
would be equivalent toip permit <protocol> 4434
. - L3
IpRanges
are applied based on the direction field they are located in SecurityGroups.InboundRules
have the IP prefix applied to thesaddr
field in nftables in the input chain, whileOutboundRules
are applied to thedaddr
field in the outbound chain. - The layer 3 address is either the source address for inbound or the destination address for outbound rules.
- In regard to L4 ports, destination ports are what will be supported. For example, an ingress rule of
input tcp dport 22 counter accept
would apply port 22 to the L4 dport value in thenexodus-inbound
chain meaning any host can connect to the node on port 22. - The layer 4 port will always apply to the destination port regardless of whether it is in the inbound or outbound chain.
- There are some scenarios where data will need to be normalized. It also makes sense to pre-process rules for type checking and valid inputs before they arrive at the API server in locations such as
nexctl
and the UI, but there will likely need to be some rule validation in the security group handler. Here are some examples: - A user could specify protocol ip to ports 100-200. We would infer that would be TCP and UDP permit dport 100-200, performed in two rules.
- Also, we may want to force ipv4 or ipv6 rather than allowing a generic ip value in Protocol. Alternatively, we can make an assumption that IP should imply both protocol families, v4 and v6. The same applies to ICMP, icmpv4 and icmpv6. Once we narrow in on the user experience via the web UI the appropriate path will likely be obvious.
Here is a functioning code example of the proposed default security rules with comments inline:
// default explicit permit ipv4 any rule
explicitPermitIPv4Rule := models.SecurityRuleJson{
IpProtocol: "ipv4", // Proto
FromPort: 0, // Starting Port Range
ToPort: 0, // Ending Port Range
// IpRanges are any v4 or v6 addresses. Supported types are the same as nftables, such as:
// 192.168.1.1, 192.168.1.0/24, 192.168.1.10-192.168.1.20,
// 2003:0db8:0000:0000:0000:0000:0000:0000-2003:0db8:ffff:ffff:ffff:ffff:ffff:ffff,
// 2001:0db8:1337:cafe::/64, fd00:face:b00c:cafe::4 etc.
// Essentially, range x-y, cidr notation or individual addresses
IpRanges: []string{},
}
// default explicit permit ipv6 any rule
explicitPermitIPv6Rule := models.SecurityRuleJson{
IpProtocol: "ipv6",
FromPort: 0,
ToPort: 0,
IpRanges: []string{},
}
// default explicit permit icmp any rule
explicitPermitIcmpIPv4Rule := models.SecurityRuleJson{
IpProtocol: "icmp",
FromPort: 0,
ToPort: 0,
IpRanges: []string{},
}
// default explicit permit icmpv6 any rule
explicitPermitIcmpIPv6Rule := models.SecurityRuleJson{
IpProtocol: "icmpv6",
FromPort: 0,
ToPort: 0,
IpRanges: []string{},
}
inboundRules := []models.SecurityRuleJson{explicitPermitIPv4Rule, explicitPermitIPv6Rule, explicitPermitIcmpIPv4Rule, explicitPermitIcmpIPv6Rule}
var outboundRules []models.SecurityRuleJson
inboundRulesJSON, err := json.Marshal(inboundRules)
if err != nil {
return models.SecurityGroup{}, fmt.Errorf("error marshalling inbound rules: %w", err)
}
outboundRulesJSON, err := json.Marshal(outboundRules)
if err != nil {
return models.SecurityGroup{}, fmt.Errorf("error marshalling outbound rules: %w", err)
}
Example default nftables table from the code above with comments inline. All rules are applied only to the driver interface:
table inet nexodus { // nftables table name
chain nexodus-inbound { // ingress chain name
type filter hook input priority filter; policy accept; // ingress chain policy
ct state established,related iifname "wg0" counter packets 59 bytes 11407 accept // established ct tracking
icmpv6 type { echo-request, echo-reply } iifname "wg0" counter packets 2 bytes 112 accept // permit icmpv6
icmp type { echo-reply, echo-request } iifname "wg0" counter packets 1 bytes 84 accept // permit icmpv4
meta nfproto ipv4 iifname "wg0" counter packets 1 bytes 64 accept // permit ipv4
meta nfproto ipv6 iifname "wg0" counter packets 2 bytes 168 accept // permit ipv6
iifname "wg0" counter packets 0 bytes 0 drop // drop any other traffic
}
chain nexodus-outbound { // egress chain name
type filter hook input priority filter; policy accept; //
}
}
Security Group User Interface
- The user can add rules via the web UI, the
nexctl
tool, or the HTTP API. Ideally, the credentials used by the agent would not have write access to these policies. However, until we have device-specific auth tokens available, there is no way to prevent it. That is a security feature tracked in issue #647. - Users can modify the rules installed by Nexodus on the device if they have administrative access to nftables. The Nexodus Agent must reconcile what is currently applied to nftables with what is intended. This means that manual changes to the
nexodus
table are subject to being overwritten by the Nexodus Agent.
Rule Deconfliction
- The deconfliction of user-provided rules is managed by nft. Let's look at the following example where a user defines a permit
icmp6 any
and anicmp6 2001:0db8:1337:cafe::/64
. The JSON would look as follows:
{
"ip_protocol": "icmp",
"from_port": 0,
"to_port": 0,
"ip_ranges": [
"2001:0db8:1337:cafe::/64"
]
},
{
"ip_protocol": "icmp",
"from_port": 0,
"to_port": 0
}
The actual rule in nftables would only insert the LPM (Longest Prefix Match). The resulting chain would look as follows.
table inet nexodus {
chain nexodus-inbound {
type filter hook input priority filter; policy accept;
iifname "wg0" ct state established,related counter packets 0 bytes 0 accept
}
chain nexodus-outbound {
type filter hook output priority filter; policy accept;
}
The same LPM rule optimizations also apply to IPv4 and IPv6.
New Tables
A new table will be defined for SecurityGroups
SecurityGroups
- ID
- Name
- Description (SecurityRules JSON)
- InboundRules (SecurityRules JSON)
- OutboundRules (Security)
- OrganizationID (ZeroOrOne)
- UserID (ZeroOrOne)
SecurityRules
Security rules will not be a new database but rules either inbound or outbound stored in the SecurityGroup field as JSON.
- ID
- SecurityGroupID
- Protocol
- FromPort
- ToPort
- Destination Address
- Destination Port
- OrganizationID
CRUD Actions with API for Security Groups
In this section, we will outline the CRUD actions that can be performed using the API for managing security groups.
Create Security Group
To create a new security group for an organization or user, send a POST request to the following endpoint with the required data:
Endpoint:
POST /organizations/$org_id/security_groups
POST /user/$org_id/security_groups
Payload
{
"group_name": "Example Security Group",
"group_description": "A sample security group for demonstration purposes",
"inbound_rules": [
{
"ip_protocol": "tcp",
"from_port": 22,
"to_port": 22,
"ip_ranges": ["172.16.100.0/24"]
}
],
"outbound_rules": []
}
Read Security Group(s)
To get a list of security groups for an organization or user, send a GET request to the following endpoint:
GET /organizations/$org_id/security_groups
GET /user/$org_id/security_groups
To get detailed information about a specific security group for an organization or user, send a GET request to the following endpoint:
GET /organizations/$org_id/security_groups/$sg_id
GET /user/$org_id/security_groups/$sg_id
Update Security Group
To update a security group for an organization or user, send a PATCH request to the following endpoint with the updated data:
Endpoint:
PATCH /organizations/$org_id/security_groups/$sg_id
PATCH /users/$org_id/security_groups/$sg_id
Payload:
{
"group_name": "Updated Security Group",
"group_description": "An updated security group for demonstration purposes",
"inbound_rules": [
{
"ip_protocol": "tcp",
"from_port": 22,
"to_port": 22,
"ip_ranges": ["10.100.0.0/20"]
},
{
"ip_protocol": "udp",
"from_port": 53,
"to_port": 53,
"ip_ranges": ["0.0.0.0/0"]
}
],
"outbound_rules": []
}
Delete Security Group
To delete a security group for an organization or user, send a DELETE request to the following endpoint:
DELETE /organizations/$org_id/security_groups/$sg_id
DELETE /users/$org_id/security_groups/$sg_id
New and Modified Structs
Security Group Model
- New model
// SecurityGroup represents a security group containing security rules and a group owner
type SecurityGroup struct {
Base
GroupName string `json:"group_name"`
GroupDescription string `json:"group_description"`
OrganizationId uuid.UUID `json:"org_id"`
InboundRules string `json:"inbound_rules,omitempty" gorm:"type:JSONB; serializer:json"`
OutboundRules string `json:"outbound_rules,omitempty" gorm:"type:JSONB; serializer:json"`
}
// AddSecurityGroup is the information needed to add a new Security Group.
type AddSecurityGroup struct {
GroupName string `json:"group_name" example:"group_name"`
GroupDescription string `json:"group_description" example:"group_description"`
OrganizationId uuid.UUID `json:"org_id"`
InboundRules string `json:"inbound_rules,omitempty" gorm:"type:JSONB; serializer:json"`
OutboundRules string `json:"outbound_rules,omitempty" gorm:"type:JSONB; serializer:json"`
}
// UpdateSecurityGroup is the information needed to update an existing Security Group.
type UpdateSecurityGroup struct {
GroupName string `json:"group_name,omitempty"`
GroupDescription string `json:"group_description,omitempty"`
InboundRules string `json:"inbound_rules,omitempty" gorm:"type:JSONB; serializer:json"`
OutboundRules string `json:"outbound_rules,omitempty" gorm:"type:JSONB; serializer:json"`
}
Security Rules Struct
- New Struct. Rather than making a new table for Rules, we are embedding security rules as JSON into security group columns,
inbound_rules
andoutbound_rules
.
// SecurityRuleJson represents a security rule
type SecurityRuleJson struct {
IpProtocol string `json:"ip_protocol"`
FromPort int64 `json:"from_port"`
ToPort int64 `json:"to_port"`
IpRanges []string `json:"ip_ranges,omitempty"`
}
Other Table Changes
- Device
type ModelsDevice struct {
...
SecurityGroups []uuid.UUID `json:"security_groups,omitempty"`
}
- Organization: Every organization will receive a Security Group on creation.
type ModelsOrganization struct {
...
SecurityGroups []uuid.UUID `json:"security_groups,omitempty"`
}
- User
type ModelsUser struct {
...
SecurityGroups []uuid.UUID `json:"security_groups,omitempty"`
}
Alternatives Considered
- A primary alternative is how much control to expose to the user. Specifically, do you allow the user to have access to deny rules? The benefits are not obvious and as referenced in Default Security Group and Rules.
- Iptables user-space application for managing netfilter is the predominant acl implementation today but is planned for deprecation across all major Linux distributions.