[ISN] ITL Bulletin for October 2004

Fri Oct 29 04:29:41 EDT 2004

Forwarded from: Elizabeth Lennon <elizabeth.lennon at nist.gov>

ITL BULLETIN FOR OCTOBER 2004

SECURING VOICE OVER INTERNET PROTOCOL (IP) NETWORKS
By Thomas J. Walsh and D. Richard Kuhn
National Institute of Standards and Technology
Technology Administration
U.S. Department of Commerce

Voice over IP (VOIP) - the transmission of voice over traditional
packet-switched IP networks - is one of the hottest trends in
telecommunications. As with any new technology, VOIP introduces both
opportunities and security challenges. Lower cost and greater
flexibility are among the promises of VOIP for the enterprise, but
security administrators will face significant issues. Administrators
may assume that since digitized voice travels in packets, they can
simply plug VOIP components into their already-secured networks and
expect a stable and secure voice network. Unfortunately, many of the
tools used to safeguard today's computer networks, namely firewalls,
Network Address Translation (NAT), and encryption, don't work "as is"
in a VOIP network.

VOIP systems take a wide variety of forms. Just about any computer is
capable of providing VOIP, and most users don't realize that they
already have basic VOIP applications.  Microsoft's NetMeeting, or the
newer Windows Messenger, which come with Windows platforms, provides
voice and video services, and Linux platforms have a number of VOIP
applications from which to choose. In general, though, the term Voice
Over IP is associated with equipment that provides the ability to dial
telephone numbers and communicate with parties on the other end who
may have either another VOIP system or a traditional analog telephone.
Demand for VOIP services has resulted in a broad array of products,
including:

* Traditional telephone handset - Usually these products have extra
  features beyond a simple handset with dial pad.  Some of these units
  may have a "base station" design that provides the same convenience 
  as a conventional cordless phone.

* Conferencing units - These provide the same type of service as
  conventional conference calling phone systems, but since 
  communication is handled over the Internet, they may allow users to 
  coordinate traditional data communication services, such as a 
  whiteboard that displays on computer monitors at both ends.

* Mobile units - Wireless VOIP units are becoming increasingly
  popular, especially since many organizations already have an 
  installed base of 802.11 networking equipment. Wireless VOIP 
  products present particularly acute security problems, given the 
  well-known weaknesses of the 802.11 family of protocols.

* PC or "softphone" - With a headset, software, and inexpensive
  connection service, any PC or workstation can be used as a VOIP 
  unit, often referred to as a "softphone."

In addition to end-user equipment, VOIP systems include specialized
components beyond those found on an ordinary IP network: call managers
and media/signaling gateways.  Call managers are required to set up
calls, monitor call state, handle number translation, and provide
basic telephony services. Call managers also handle signaling
functions that coordinate with media gateways, which are the interface
between the VOIP network and the public switched telephone network
(PSTN). Depending on the system, gateway functions may be implemented
as a board or dedicated appliance, or may be provided through a
distributed system of servers and databases.

Current VOIP systems use one of two protocols, H.323 or the Session
Initiation Protocol (SIP). SIP is the IETF specified protocol for
initiating a two-way communication session.  It was designed to be
simpler than H.323, but has become increasingly complex, as the
standard has evolved. SIP is text based; its messages are similar to
e-mail message formats. Also, SIP is an application level protocol,
that is, it is decoupled from the protocol layer it is transported
across. Unlike H.323, SIP uses only one port in the call setup
process. The architecture of a SIP network also differs from the H.323
structure. A SIP network is made up of end points, a proxy and/or
redirect server, location server, and registrar. In the SIP model, a
user is not bound to a specific host. Instead, users initially report
their location to a registrar, which may be integrated into a proxy or
redirect server. H.323 is the International Telecommunication Union
(ITU)  specification for audio and video communication across
packetized networks. H.323 acts as a wrapper for a suite of media
control recommendations by the ITU incorporating several other
protocols, including H.225 and H.245. Each of these protocols has a
specific role in the call setup process, and all but one make use of
dynamic ports. An H.323 network is made up of several endpoints
(terminals)  that are normally bound to a specific address, a gateway,
and possibly a gatekeeper, multipoint control unit, and back end
service. The gateway serves as a bridge between the H.323 network and
the outside world of (possibly)  non-H.323 devices, including SIP
networks and traditional PSTN networks.

Most VOIP components have counterparts used in data networks, but the
performance demands of VOIP mean that ordinary network software and
hardware must be supplemented with special VOIP components. One of the
main sources of confusion for those new to VOIP is the assumption that
because digitized voice travels in packets just like other data,
existing network architectures and tools can be used with little or no
change. Unfortunately, VOIP adds a number of complications to existing
network technology, and these problems are compounded by security
considerations.

What's Different About VOIP Security?

To understand why security for VOIP isn't the same as data network
security, we need to look at both the unique constraints of
transmitting voice over a packet network, and at characteristics
shared by VOIP and data networks.  Packet networks depend on a large
number of configurable parameters: IP and media access control (MAC)
(physical)  addresses of voice terminals, addresses of routers and
firewalls. VOIP networks add specialized software such as call
managers and other programs used to place and route calls. Many of the
network parameters are established dynamically every time a network
component is restarted, or when a VOIP telephone is restarted or added
to the network.  Because there are so many places in a VOIP network
with dynamically configurable parameters, intruders have as wide an
array of potentially vulnerable points to attack as they have with
data networks. But VOIP systems have much stricter performance
constraints than data networks, with significant implications for
security.

Quality of Service (QoS) is fundamental to the operation of a VOIP
network. A VOIP application is much more sensitive to delays than its
traditional data counterparts. If one downloads a file, a slowdown of
a few seconds is negligible. In contrast, a delay of merely 150
milliseconds is enough to turn a crisp VOIP call into a garbled,
unintelligible mess. In the VOIP vernacular, this is termed the
latency problem.

Latency turns traditional security measures into double-edged swords
for VOIP. Tools such as encryption and firewall protection can help
secure the network, but they also introduce a significant amount of
delay. Latency is not just a quality of service issue, but a security
issue as well, because it increases the system's susceptibility to a
Denial of Service (DoS) attack. For a DoS attack to succeed in a VOIP
network, it need not completely shut down the system. It must only
delay voice packets for a fraction of a second. The necessary
impediment is even less when latency-producing security devices are
slowing down traffic.

Another QoS issue, jitter, refers to non-uniform delays that can cause
packets to arrive and be processed out of sequence. Real-time
Transport Protocol (RTP), the protocol used to transport voice media,
is based on the User Datagram Protocol (UDP), so packets received out
of order cannot be reassembled at the transport level, and therefore
must be reordered at the application level, introducing a significant
overhead. Even when packets manage to arrive in order, high jitter
causes them to arrive at their destination in spurts. This scenario is
analogous to uniform road traffic coming to a stoplight. As soon as
the stoplight turns green (bandwidth opens up), traffic races through
in a clump.

Infrastructure issues become significant with a change to VOIP. With
conventional telephones, eavesdropping requires either physical access
to tap a line or penetration of a switch. Attempting physical access
increases the intruder's risk of being discovered, and conventional
private branch exchanges (PBXs) typically use proprietary protocols,
specialized software, and have fewer points of access than VOIP
systems. With VOIP, opportunities for eavesdroppers are multiplied.
VOIP units share physical network connections with the data network,
and in many cases, VOIP and data are on the same logical portion of
the network.  Protocols are standardized, and tools to monitor and
control packet networks are widely available. Attaching a packet
sniffer, such as the freely available "voice over misconfigured
internet telephony" (known by its unfortunate acronym "vomit"), to the
VOIP network segment makes it easy to intercept voice traffic.

Like other types of software, VOIP systems have been found to have
vulnerabilities due to buffer overflows and improper packet header
handling. Exploitable software flaws typically result in two types of
vulnerabilities: denial of service or disclosure of critical system
parameters. In some cases, the system can be crashed, producing a
memory dump in which an intruder can find IP addresses of critical
system nodes, passwords, or other security-relevant information.
Crashing a VOIP server may also result in a restart that restores
default passwords or falls prey to a rogue server attack. In addition,
buffer overflows that allow the introduction of malicious code have
been found in VOIP software, as in other applications.

Tradeoffs between convenience and security are routine in software,
and VOIP is no exception. Most, if not all, VOIP components use
integrated web servers for configuration.  Web interfaces can be
attractive, easy to use, and inexpensive to produce because of the
wide availability of good development tools. Unfortunately, most web
development tools are built with features and ease of use in mind,
with less attention to the security of the applications they help
produce. VOIP device web applications have been discovered with weak
or no access control, script vulnerabilities, and inadequate parameter
validation, resulting in privacy and denial of service
vulnerabilities.  As VOIP gains in popularity, with implementations on
devices of all types, it is almost inevitable that more administrative
web applications with exploitable errors will be found. What do the
Special Characteristics of VOIP Mean for Security?

Meeting the security challenges of VOIP can require changes to a
number of familiar security components.  Firewalls are a staple of
security in today's IP networks. Whether protecting a local-area
network (LAN), a wide-area network (WAN), encapsulating a
demilitarized zone (DMZ), or just protecting a single computer, a
firewall is usually the first line of defense. Firewalls work by
blocking traffic deemed to be malicious or potentially risky.
Acceptable traffic is determined by a set of rules programmed into the
firewall by the network administrator. These may include such commands
as "Block all FTP traffic (port 21)" or "Allow all http traffic (port
80)." Much more complex rule sets are available in almost all
firewalls. Firewalls also provide a central location for deploying
security policies, the ultimate bottleneck for network traffic,
because no traffic can enter or exit the LAN without passing through
the firewall.

This situation lends itself to the VOIP network where firewalls
simplify security management by consolidating security measures at the
firewall gateway, instead of requiring all the endpoints to maintain
up-to-date security policies. This takes an enormous burden off the
VOIP network infrastructure. Unfortunately, this abstraction and
simplification of security measures comes at a price. The introduction
of firewalls to the VOIP network complicates several aspects of VOIP,
most notably dynamic port trafficking and call setup procedures.
Several commercial solutions are available to alleviate this including
Application Level Gateways (ALGs), that make the firewall
"VOIP-aware," and Midcom Controls, which allow the firewall to be
traversed by allowing it to receive instruction from an
application-aware agent. That is, they can understand the VOIP
protocol data carried as a payload within an ordinary packet, making
it possible to do stateful filtering of call packets. Attempting to
implement a VOIP system on a legacy network without such devices is
generally not feasible.

Firewalls, gateways, and other such devices can help keep intruders
from compromising a network. However, these devices are no defense
against an internal hacker and don't protect voice data as it crosses
the Internet. Another layer of defense is necessary at the protocol
level to protect the data itself. In VOIP, as in data networks, this
can be accomplished by encrypting the packets at the IP level using
Internet Protocol Security (IPsec). This way, if anyone intercepts
VOIP traffic and is not the intended recipient (for instance, via a
packet sniffer), such packets would be unintelligible. The IPsec suite
of security protocols and encryption algorithms is the standard for
securing packets against unauthorized viewers over data networks and
will be supported by the protocol stack in IPv6. So it seems logical
to extend IPsec to VOIP, encrypting the signal and voice packets on
one end and decrypting them only when needed by their intended
recipient.  Unfortunately, the nature of the signaling protocols and
the VOIP network itself make it necessary for routers, proxies, and
other components to read the VOIP packets, so encryption is often done
at the gateways to a network, rather than the endpoints. Such a scheme
also allows the endpoints to be computationally simple and promotes
scalability as new encryption algorithms can be overlaid on the
network without upgrading the endpoints.  Several factors, including
the expansion of packet size, ciphering latency, and a lack of QoS
urgency in the cryptographic engine itself, can cause an excessive
amount of latency in the VOIP packet delivery. This leads to degraded
voice quality, so once again there is a tradeoff between security and
voice quality, and a need for speed.

Virtual private network (VPN) tunneling of VOIP has also become
popular recently, but the congestion and bottlenecks associated with
encryption suggest that this solution may not always be scalable.
Although great strides are being made in this area, the hardware and
software necessary to ensure call quality for encrypted voice traffic
may not be economically or architecturally viable for all enterprises
considering the move to VOIP.

What are the Prospects for Securing a VOIP Network?

Thus far, we have painted a fairly bleak picture of VOIP security. The
construction of a VOIP network is an intricate procedure that should
be studied in great detail before being attempted. Integrating a VOIP
system into an already congested or overburdened network could be
disastrous for an organization's technology infrastructure.  There is
no easy "one size fits all" solution to the issues discussed in this
bulletin. The use of VPNs, versus ALG-like solutions and the choice of
SIP or H.323 are decisions that must be made based on the specific
nature of the current network and the VOIP network to be. However, the
technical problems are solvable, and the establishment of a secure
implementation of VOIP is well worth the difficulty associated with
these solutions. To implement VOIP securely today, start with these
general guidelines, recognizing that practical considerations may
require adjustments for the organization:

* Put voice and data on logically separate networks.  Different
  subnets with separate RFC 1918 address blocks should be used for 
  voice and data traffic, with separate DHCP servers for each, to ease 
  the incorporation of intrusion detection and VOIP firewall 
  protection.

* At the voice gateway, which interfaces with the PSTN, disallow
  H.323, SIP, or Media Gateway Control Protocol (MGCP) connections 
  from the data network. Use strong authentication and access control 
  on the voice gateway system, as with any other critical network 
  management component.

* A mechanism to allow VOIP traffic through firewalls is required.
  There are a variety of protocol-dependent and independent solutions,
  including ALGs for VOIP protocols, Session Border Controllers, or
  other standards-based solutions. Stateful packet filters can track 
  the state of connections, denying packets that are not part of a 
  properly originated call.

* Use IPsec or Secure Shell (SSH) for all remote management and
  auditing access. If practical, avoid using remote management at all
  and do IP PBX access from a physically secure system.

* Use IPsec tunneling when available instead of IPsec transport
  because tunneling masks the source and destination IP addresses. 
  This secures communications against rudimentary traffic analysis 
  (i.e., determining who is calling each other).

* If performance is a problem, use encryption at the router or other
  gateway, not the individual endpoints, to provide for IPsec tunneling.
  Since some VOIP endpoints are not computationally powerful enough to
  perform encryption, placing this burden at a central point ensures 
  all VOIP traffic emanating from the enterprise network has been 
  encrypted. Newer IP phones are able to provide Advanced Encryption 
  Standard (AES) encryption at a reasonable cost.

* Look for IP Phones that can load digitally (cryptographically)
  signed images to guarantee the integrity of the software loaded onto
  the IP Phone.

* "Softphone" systems, which implement VOIP using an ordinary PC with
  a headset and special software, should be avoided, if possible, 
  where security or privacy are a concern. In addition to violating 
  the separation of voice and data, PC-based VOIP applications can be
  vulnerable to worms and viruses that are all too common on PCs, and
  may infect other parts of the network.

* Consider methods to "harden" any VoIP platform based on common
  operating systems such as Windows or Linux. This includes disabling
  unnecessary services and possibly using host-based intrusion 
  detection methods.

* Be especially diligent about maintaining patches and current
  versions of VOIP software.

* Analyze the impact of VOIP adoption on the rest of the
  organization's infrastructure, including issues such as backup 
  power, E-911 emergency location, and records retention policies 
  or other legal issues.

VOIP can be done securely, but the path is not smooth. It will likely
be several years before standards issues are settled and VOIP systems
become a mainstream commodity.  Until then, organizations should
proceed cautiously and not assume that VOIP components are just more
peripherals for the local network. Above all, it is important to keep
in mind the unique requirements of VOIP, acquiring the right hardware
and software to meet the challenges of VOIP security. For more
information on securing VOIP systems, see draft NIST Special
Publication 800-58, Security Considerations for Voice Over IP Systems,
at http://csrc.nist.gov/publications/nistpubs/index.html.

Disclaimer: Any mention of commercial products or reference to
commercial organizations is for information only; it does not imply
recommendation or endorsement by the National Institute of Standards
and Technology nor does it imply that the products mentioned are
necessarily the best available for the purpose.

Elizabeth B. Lennon
Writer/Editor
Information Technology Laboratory
National Institute of Standards and Technology
100 Bureau Drive, Stop 8900
Gaithersburg, MD 20899-8900
Telephone (301) 975-2832
Fax (301) 840-1357