// voice & telephony · beginner

VoIP Fundamentals: How Voice Over IP Actually Works

9 min read · Updated May 2026 · By TechDirectory Editorial Team
In a nutshell: VoIP — Voice over IP — is just phone calls carried as data packets across an IP network instead of a dedicated telephone circuit. Two protocols do the heavy lifting: SIP handles signalling (ring, answer, hang up), RTP carries the actual audio. A codec compresses your voice into bits before sending and reconstructs it at the other end. Almost every business phone today is VoIP underneath, even when the user just sees a normal-looking desk phone.

What is VoIP?

VoIP stands for Voice over Internet Protocol. It's an umbrella term for any way of making phone calls where the call travels as IP packets over a computer network, rather than as an analogue signal down a copper pair, or as a digitised channel on the traditional phone system (PSTN, ISDN).

If you've ever used Microsoft Teams to make a call, dialled out from Zoom, used a Cisco / Yealink / Poly desk phone in an office, used WhatsApp voice, or set up a softphone app on your laptop — you've used VoIP. So has roughly every business phone system installed in Singapore since about 2015.

Why VoIP took over

VoIP didn't just replace traditional telephony for cost reasons. It replaced it because it removes the artificial separation between "voice" and "everything else IT."

How a call works, step by step

When you pick up a VoIP phone and dial a number, roughly this happens:

  1. Signalling. Your phone sends a SIP INVITE message to your PBX (or cloud calling service) saying "I want to call this number, here's how to reach me for audio."
  2. Routing. The PBX figures out where the call should go. Internal extension? It looks up the registered IP address. External number? It hands the call to a SIP trunk connected to a telecom carrier, which puts it onto the wider phone network.
  3. Ringing. The remote phone (or carrier) responds with SIP messages — 180 Ringing, then 200 OK when answered.
  4. Audio. Once both ends are connected, a separate stream — RTP packets carrying chunks of compressed audio — flows directly between the endpoints (or via a media relay). This is the actual conversation.
  5. Hang up. Either side sends a SIP BYE. The audio stream stops. Done.

The two-protocol split (signalling vs media) is fundamental. You'll see it everywhere in VoIP: the "control plane" handled by SIP, the "data plane" handled by RTP.

Two protocols: SIP and RTP

SIP (Session Initiation Protocol) is the signalling protocol — it sets up, modifies, and tears down calls. It looks a lot like HTTP if you've ever read raw web traffic: text-based, with methods (INVITE, ACK, BYE) and response codes (200 OK, 404 Not Found, 486 Busy Here). SIP itself doesn't carry voice — it just negotiates the call.

RTP (Real-time Transport Protocol) carries the actual media — the audio packets, or video packets if it's a video call. RTP is designed for real-time delivery: it numbers packets and timestamps them so the receiver can re-order what arrives and detect what's missing, but it doesn't retransmit lost packets (a packet that arrives late is worse than a packet that never arrives).

A typical VoIP call generates roughly 50 RTP packets per second per direction, each ~20 ms of audio. That's why low jitter and low packet loss matter so much for voice — see LAN vs WAN Basics for the network metrics that matter.

Codecs — turning sound into bits

A codec (coder–decoder) is the algorithm that compresses your voice into a bitstream small enough to ship efficiently across a network, and reconstructs it at the other end. Different codecs trade off bandwidth, sound quality, and CPU cost differently.

CodecBandwidth (per call, both ways)QualityNotes
G.711 (PCMU / PCMA)~160 kbpsToll-quality (PSTN-equivalent)The classic. Mandatory baseline for most systems and carriers.
G.722~160 kbpsHD voice (wideband)Doubles audio bandwidth vs G.711. Standard for HD voice between modern handsets.
G.729~32 kbpsAcceptableNarrowband, very compressed. Common over WAN where bandwidth is tight.
Opus~12–128 kbps adaptiveExcellent (HD / fullband)Modern, open codec used by WebRTC, Teams, Zoom, Discord.

If both endpoints support a higher-quality codec, they'll use it. If one only supports G.711, that's what gets used. This negotiation happens in the SIP setup, in a piece called the SDP (Session Description Protocol) body — basically each side advertises which codecs it can speak.

What makes calls sound bad

Almost every VoIP call quality complaint traces back to the network. The three usual suspects:

"Echo" — usually caused by acoustic feedback from a speakerphone, or by an analogue-to-digital conversion somewhere in the path — is the other common complaint. Modern phones and conferencing systems include echo cancellation that handles most cases.

How VoIP is deployed today

Three patterns cover almost every business deployment:

1. Cloud calling (UCaaS)

The entire phone system lives in a provider's cloud — Microsoft Teams Phone, Zoom Phone, RingCentral, 8x8, Dialpad, Cisco Webex Calling. You add users in a web portal, install the app on laptops and mobiles, optionally ship physical desk phones. Calls to outside numbers go through the provider's SIP trunks. Lowest friction, lowest upfront cost, monthly per-user pricing. The dominant pattern for new deployments in Singapore SMEs.

2. On-prem or hosted PBX + SIP trunks

A PBX server (often a virtual machine — 3CX, FreePBX, Avaya, Cisco UCM, Mitel) handles internal call routing. It connects to one or more SIP trunks from a Singapore carrier (Singtel, StarHub, M1, MyRepublic, plus specialists like ViewQwest) for outside calls. More control, more responsibility. Common for mid-market and enterprise — especially where there's complex call-flow logic or contact-centre integration.

3. Hybrid

Microsoft Teams (or similar) for collaboration plus a traditional PBX or contact-centre platform for high-volume call handling, with the two integrated by SIP. Common in industries like banking, hospitality, and large retail.

VoIP in Singapore — a few specifics

Where to go next

If VoIP made sense, the natural next steps:

Browse VoIP & telecom providers in Singapore

Looking to roll out (or replace) a VoIP system?

Browse Telecom Providers →