
Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings
Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings

What will I learn
- You will learn what sentinel-terminated types are and why Zig has them;
- the difference between
[]u8,[:0]u8,[*:0]u8, and[*c]u8; - how C strings (null-terminated) map to Zig's type system;
- converting between Zig slices and C strings safely;
- the
std.mem.span()function for finding sentinel boundaries; - sentinel-terminated arrays and their compile-time guarantees;
- working with C string APIs: strlen, strcmp equivalents in Zig;
- common pitfalls when mixing Zig and C string conventions.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Zig 0.14+ distribution (download from ziglang.org);
- The ambition to learn Zig programming.
Difficulty
- Intermediate
Curriculum (of the Learn Zig Series):
- @scipio/zig-programming-tutoroial-ep001-intro" target="_blank" rel="noopener noreferrer">Zig Programming Tutorial - ep001 - Intro
- @scipio/learn-zig-series-2-hello-zig-variables-and-types" target="_blank" rel="noopener noreferrer">Learn Zig Series (#2) - Hello Zig, Variables and Types
- @scipio/learn-zig-series-3-functions-and-control-flow" target="_blank" rel="noopener noreferrer">Learn Zig Series (#3) - Functions and Control Flow
- @scipio/learn-zig-series-4-error-handling-zigs-best-feature" target="_blank" rel="noopener noreferrer">Learn Zig Series (#4) - Error Handling (Zig's Best Feature)
- @scipio/learn-zig-series-5-arrays-slices-and-strings" target="_blank" rel="noopener noreferrer">Learn Zig Series (#5) - Arrays, Slices, and Strings
- @scipio/learn-zig-series-6-structs-enums-and-tagged-unions" target="_blank" rel="noopener noreferrer">Learn Zig Series (#6) - Structs, Enums, and Tagged Unions
- @scipio/learn-zig-series-7-memory-management-and-allocators" target="_blank" rel="noopener noreferrer">Learn Zig Series (#7) - Memory Management and Allocators
- @scipio/learn-zig-series-8-pointers-and-memory-layout" target="_blank" rel="noopener noreferrer">Learn Zig Series (#8) - Pointers and Memory Layout
- @scipio/learn-zig-series-9-comptime-zigs-superpower" target="_blank" rel="noopener noreferrer">Learn Zig Series (#9) - Comptime (Zig's Superpower)
- @scipio/learn-zig-series-10-project-structure-modules-and-file-io" target="_blank" rel="noopener noreferrer">Learn Zig Series (#10) - Project Structure, Modules, and File I/O
- @scipio/learn-zig-series-11-mini-project-building-a-step-sequencer" target="_blank" rel="noopener noreferrer">Learn Zig Series (#11) - Mini Project: Building a Step Sequencer
- @scipio/learn-zig-series-12-testing-and-test-driven-development" target="_blank" rel="noopener noreferrer">Learn Zig Series (#12) - Testing and Test-Driven Development
- @scipio/learn-zig-series-13-interfaces-via-type-erasure" target="_blank" rel="noopener noreferrer">Learn Zig Series (#13) - Interfaces via Type Erasure
- @scipio/learn-zig-series-14-generics-with-comptime-parameters" target="_blank" rel="noopener noreferrer">Learn Zig Series (#14) - Generics with Comptime Parameters
- @scipio/learn-zig-series-15-the-build-system-buildzig" target="_blank" rel="noopener noreferrer">Learn Zig Series (#15) - The Build System (build.zig)
- @scipio/learn-zig-series-16-sentinel-terminated-types-and-c-strings" target="_blank" rel="noopener noreferrer">Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings (this post)
Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings
Welcome back! In @scipio/learn-zig-series-15-the-build-system-buildzig" target="_blank" rel="noopener noreferrer">episode #15 we covered build.zig -- how it's real Zig code (not YAML, not TOML), how the build graph works with steps and dependencies, how to link C libraries, how to use build.zig.zon for package management, and how cross-compilation is a first-class feature that Just Works. At the end I teased that we'd look at sentinel-terminated types and C strings -- at that mysterious colon-zero in [*:0]const u8. Well, here we are.
C strings are just pointers to bytes that happen to end with a zero byte. No length stored anywhere. No bounds checking. The source of roughly half of all security vulnerabilities in C code over the past 50 years. Every buffer overflow exploit you've ever heard of -- most of them boil down to someone passing a byte sequence to a function that expected a null terminator and didn't find one (or found one in the wrong place).
Zig does something clever: it encodes the sentinel value directly into the type system. A [:0]u8 is a slice of bytes that is guaranteed to have a zero byte sitting at position slice[slice.len]. The compiler enforces this. You get C compatibility without giving up safety. And this matters every single time you call a C function that expects a const char* -- which is, realistically, every time you do C interop. Because C interop is one of Zig's primary selling points (we saw @cImport and linkLibC() in ep015), understanding how the type system handles the boundary between Zig's safe slices and C's null-terminated pointers is essential.
Here we go!
Solutions to Episode 15 Exercises
Before we get into sentinels, here are the solutions to last episode's exercises on the build system. These are complete, buildable examples -- copy into the right files, zig build, done.
Exercise 1 -- Multi-binary build with shared module, named run steps, and test step:
// build.zig
const std = @import("std");
pub fn build(b: *std.Build) void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
// Shared module
const common_module = b.addModule("common", .{
.root_source_file = b.path("src/common.zig"),
.target = target,
.optimize = optimize,
});
// Binary 1: app
const app = b.addExecutable(.{
.name = "app",
.root_source_file = b.path("src/main.zig"),
.target = target,
.optimize = optimize,
});
app.root_module.addImport("common", common_module);
b.installArtifact(app);
const run_app = b.addRunArtifact(app);
run_app.step.dependOn(b.getInstallStep());
if (b.args) |args| run_app.addArgs(args);
const run_app_step = b.step("run-app", "Run the main application");
run_app_step.dependOn(&run_app.step);
// Binary 2: tool
const tool = b.addExecutable(.{
.name = "tool",
.root_source_file = b.path("src/tool.zig"),
.target = target,
.optimize = optimize,
});
tool.root_module.addImport("common", common_module);
b.installArtifact(tool);
const run_tool = b.addRunArtifact(tool);
run_tool.step.dependOn(b.getInstallStep());
if (b.args) |args| run_tool.addArgs(args);
const run_tool_step = b.step("run-tool", "Run the tool");
run_tool_step.dependOn(&run_tool.step);
// Tests for all three source files
const test_step = b.step("test", "Run all tests");
const common_tests = b.addTest(.{
.root_source_file = b.path("src/common.zig"),
.target = target,
.optimize = optimize,
});
test_step.dependOn(&b.addRunArtifact(common_tests).step);
const main_tests = b.addTest(.{
.root_source_file = b.path("src/main.zig"),
.target = target,
.optimize = optimize,
});
main_tests.root_module.addImport("common", common_module);
test_step.dependOn(&b.addRunArtifact(main_tests).step);
const tool_tests = b.addTest(.{
.root_source_file = b.path("src/tool.zig"),
.target = target,
.optimize = optimize,
});
tool_tests.root_module.addImport("common", common_module);
test_step.dependOn(&b.addRunArtifact(tool_tests).step);
}
The key insight: b.step("run-app", ...) and b.step("run-tool", ...) create named build steps so zig build run-app and zig build run-tool work independently. The shared module is declared once and imported into both binaries and their test targets.
Exercise 2 -- Custom --Dlog-level= build option with comptime enum:
// build.zig
const std = @import("std");
pub fn build(b: *std.Build) void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const LogLevel = enum { info, warn, err };
const log_level = b.option(
LogLevel,
"log-level",
"Logging verbosity (default: info)",
) orelse .info;
const exe = b.addExecutable(.{
.name = "log-demo",
.root_source_file = b.path("src/main.zig"),
.target = target,
.optimize = optimize,
});
const options = b.addOptions();
options.addOption(LogLevel, "log_level", log_level);
exe.root_module.addOptions("config", options);
b.installArtifact(exe);
const run_cmd = b.addRunArtifact(exe);
run_cmd.step.dependOn(b.getInstallStep());
const run_step = b.step("run", "Run the application");
run_step.dependOn(&run_cmd.step);
}
// src/main.zig
const config = @import("config");
const std = @import("std");
fn log(comptime level: @TypeOf(config.log_level), msg: []const u8) void {
const threshold = @intFromEnum(config.log_level);
if (@intFromEnum(level) >= threshold) {
const prefix = switch (level) {
.info => "[INFO]",
.warn => "[WARN]",
.err => "[ERR]",
};
std.debug.print("{s} {s}\n", .{ prefix, msg });
}
}
pub fn main() void {
log(.info, "Application starting up");
log(.warn, "This is a warning");
log(.err, "Something went wrong");
}
Because config.log_level is comptime-known, the compiler eliminates all log() calls below the threshold. Build with zig build run -Dlog-level=warn and the info message vanishes from the binary entirely -- not skipped at runtime, actually removed during compilation.
Exercise 3 -- build.zig.zon with zig-clap dependency: this exercise requires an internet connection and the exact hash depends on the clap version you fetch. The general structure:
// build.zig.zon (after running: zig fetch --save https://github.com/Hejsil/zig-clap/archive/refs/tags/0.9.1.tar.gz)
.{
.name = "clap-demo",
.version = "0.1.0",
.dependencies = .{
.clap = .{
.url = "https://github.com/Hejsil/zig-clap/archive/refs/tags/0.9.1.tar.gz",
.hash = "12200000000000000000000000000000000000000000000000000000000000000000",
// ^ zig fetch fills the real hash
},
},
.paths = .{ "build.zig", "build.zig.zon", "src" },
}
// build.zig
const std = @import("std");
pub fn build(b: *std.Build) void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const clap_dep = b.dependency("clap", .{
.target = target,
.optimize = optimize,
});
const exe = b.addExecutable(.{
.name = "clap-demo",
.root_source_file = b.path("src/main.zig"),
.target = target,
.optimize = optimize,
});
exe.root_module.addImport("clap", clap_dep.module("clap"));
b.installArtifact(exe);
const run_cmd = b.addRunArtifact(exe);
run_cmd.step.dependOn(b.getInstallStep());
if (b.args) |args| run_cmd.addArgs(args);
const run_step = b.step("run", "Run the application");
run_step.dependOn(&run_cmd.step);
}
// src/main.zig
const std = @import("std");
const clap = @import("clap");
pub fn main() !void {
const params = comptime clap.parseParams(
\\-n, --name <str> Your name
\\-c, --count <usize> Repeat count
\\-h, --help Show help
\\
, .{}) catch unreachable;
var diag = clap.Diagnostic{};
var res = clap.parse(params, .{ .diagnostic = &diag }) catch |err| {
diag.report();
return err;
};
defer res.deinit();
const name = res.args.name orelse "world";
const count = res.args.count orelse 1;
for (0..count) |_| {
std.debug.print("Hello, {s}!\n", .{name});
}
}
Run zig fetch --save <URL> first to fill in the real content hash, then zig build run -- --name hello --count 3 should print "Hello, hello!" three times. The content hash in the .zon file is the lock -- if the upstream tarball ever changes, the build refuses to proceed until you update the hash.
Right, that's build system homework done. On to sentinels ;-)
The sentinel type family
Zig has four pointer/slice types that deal with sentinel-terminated data. Understanding the differences between them is the foundation for everything in this episode:
[N:0]u8 -- a fixed-size array of N bytes with a guaranteed zero byte at index N. The total storage is N + 1 bytes. This is what string literals are:
const std = @import("std");
const testing = std.testing;
test "string literal type" {
const hello = "hello";
// hello has type *const [5:0]u8
// 5 bytes of content + 1 sentinel byte = 6 bytes total
try testing.expectEqual(@as(usize, 5), hello.len);
try testing.expectEqual(@as(u8, 'h'), hello[0]);
try testing.expectEqual(@as(u8, 'o'), hello[4]);
try testing.expectEqual(@as(u8, 0), hello[5]); // sentinel accessible!
}
The :0 in the type signature means "terminated by zero". The sentinel byte is there but it's NOT counted in .len. You can access it at array[array.len] -- a position that would normally be out-of-bounds for a regular array. This is special: the compiler allows reading one past the end specifically because the type guarantees a sentinel lives there.
[:0]u8 -- a sentinel-terminated slice. Like a regular []u8 (pointer + length) but with the additional guarantee that slice[slice.len] == 0. This is the most common sentinel type you'll work with:
test "sentinel slice from string literal" {
const hello: [:0]const u8 = "hello";
try testing.expectEqual(@as(usize, 5), hello.len);
try testing.expectEqual(@as(u8, 0), hello[hello.len]);
// You can also slice a sentinel array to get a sentinel slice
const greeting: [5:0]u8 = .{ 'h', 'e', 'l', 'l', 'o' };
const slice: [:0]const u8 = &greeting;
try testing.expectEqual(@as(usize, 5), slice.len);
}
[*:0]u8 -- a sentinel-terminated many-pointer. This is a pointer with no known length -- it points to an unknown number of bytes that are guaranteed to end with a zero. This is what C's char* maps to in Zig. You can iterate forward until you hit the sentinel, but there's no .len field:
test "many-pointer sentinel" {
const hello: [:0]const u8 = "hello";
const ptr: [*:0]const u8 = hello.ptr;
// No .len available! Must scan for the sentinel.
var i: usize = 0;
while (ptr[i] != 0) : (i += 1) {}
try testing.expectEqual(@as(usize, 5), i);
}
[*c]u8 -- the C pointer type. This is the most permissive pointer type in Zig, designed specifically for C interop. It can be null (unlike regular Zig pointers), it has no sentinel guarantee, and it can be cast to and from almost anything. When @cImport translates C headers, char* becomes [*c]u8 and const char* becomes [*c]const u8:
test "c pointer basics" {
const hello: [:0]const u8 = "hello";
const c_ptr: [*c]const u8 = hello.ptr;
// C pointers can be null
const null_ptr: [*c]const u8 = null;
try testing.expect(null_ptr == null);
try testing.expect(c_ptr != null);
}
Here's the hierarchy from safest to most permissive:
| Type | Has length? | Has sentinel? | Can be null? | Safety |
|---|---|---|---|---|
[:0]u8 |
Yes | Yes | No | Safest -- bounds checked + sentinel |
[*:0]u8 |
No | Yes | No | Must scan for length, but sentinel guaranteed |
[*c]u8 |
No | No | Yes | Least safe -- C compatibility, anything goes |
[]u8 |
Yes | No | No | Standard slice -- no sentinel awareness |
String literals are sentinel-terminated
This is one of those things that seems obvious once you know it but is actually quite important. Every string literal in Zig has type *const [N:0]u8 where N is the number of characters. The zero byte is part of the type. The compiler puts it there at compile time and the type system remembers it:
const std = @import("std");
const testing = std.testing;
test "string literal coercions" {
const literal = "hello";
// Type: *const [5:0]u8
// Coerces to sentinel slice
const s1: [:0]const u8 = literal;
try testing.expectEqual(@as(usize, 5), s1.len);
// Coerces to regular slice (loses sentinel info)
const s2: []const u8 = literal;
try testing.expectEqual(@as(usize, 5), s2.len);
// Coerces to sentinel many-pointer (loses length info)
const s3: [*:0]const u8 = literal;
_ = s3;
// Coerces to C pointer (loses everything)
const s4: [*c]const u8 = literal;
_ = s4;
}
This coercion chain goes from more info to less info. You can always go "down" the chain (losing guarantees), but going "up" requires explicit conversion because you're adding guarantees the compiler can't verify automatically. Going from []const u8 back to [:0]const u8 -- the compiler can't know there's a zero byte at the end unless you tell it. Going from [*:0]const u8 to [:0]const u8 -- the compiler can't know the length unless you provide it.
The practical consequence: you can pass a Zig string literal directly to any C function that expects a [*:0]const u8 or [*c]const u8. The types are compatible without any casting:
const c = @cImport({
@cInclude("stdio.h");
});
pub fn main() void {
// "hello\n" is *const [6:0]u8, which coerces to [*c]const u8
_ = c.puts("hello");
// puts() expects const char* which Zig translates to [*c]const u8
}
No @ptrCast. No conversion function. The type system handles it because string literals already carry the sentinel guarantee that C functions need.
Converting between types
Going from more-info types to less-info types is automatic (coercion). Going the other direction requires explicit steps because you're asserting guarantees the compiler can't verify on its own.
Sentinel many-pointer to sentinel slice -- use std.mem.span():
const std = @import("std");
const testing = std.testing;
test "span converts sentinel pointer to slice" {
const ptr: [*:0]const u8 = "hello world";
const slice: [:0]const u8 = std.mem.span(ptr);
try testing.expectEqual(@as(usize, 11), slice.len);
try testing.expectEqualStrings("hello world", slice);
}
std.mem.span() scans forward from the pointer until it finds the sentinel (zero byte), counts the bytes, and returns a [:0]const u8 with the correct length. This is essentially the strlen operation but integrated into the type system -- the result carries the sentinel guarantee.
Regular slice to sentinel slice -- use sliceTo or allocator-based copy:
test "adding sentinel to a regular slice" {
// If you KNOW there's a zero byte at the end:
var buf = [_]u8{ 'h', 'i', 0, 0, 0 };
const slice: []u8 = buf[0..2];
// @ptrCast won't work here -- you need to assert the sentinel
const sentinel_slice: [:0]u8 = buf[0..2 :0];
try testing.expectEqual(@as(usize, 2), sentinel_slice.len);
try testing.expectEqual(@as(u8, 0), sentinel_slice[sentinel_slice.len]);
}
The syntax buf[0..2 :0] is a sentinel slice operation. It creates a [:0]u8 from index 0 to 2, and asserts that buf[2] == 0. In Debug mode, this assertion is checked at runtime -- if the byte at the sentinel position isn't zero, you get a panic with a clear message. In ReleaseFast, the check is removed and you're on your own.
Allocating a sentinel-terminated copy:
test "allocating sentinel-terminated string" {
const allocator = testing.allocator;
const source: []const u8 = "hello";
// allocSentinel allocates len + 1 bytes and sets the last to sentinel
const copy = try allocator.allocSentinel(u8, source.len, 0);
defer allocator.free(copy);
@memcpy(copy, source);
try testing.expectEqual(@as(usize, 5), copy.len);
try testing.expectEqual(@as(u8, 0), copy[copy.len]);
try testing.expectEqualStrings("hello", copy);
}
allocator.allocSentinel(u8, 5, 0) allocates 6 bytes (5 + 1 for the sentinel), sets the last byte to 0, and returns a [:0]u8. The return type carries the sentinel guarantee, so you can pass the result directly to C functions.
This is the safe way to create null-terminated strings from Zig slices. If you have a []const u8 (say, from a file read or a network buffer) and need to pass it to a C function, allocSentinel + @memcpy is the pattern.
Calling C functions from Zig
This is where sentinel types earn their keep. Let's call some standard C library functions and see how the type system mediates the boundary:
const std = @import("std");
const c = @cImport({
@cInclude("string.h");
@cInclude("stdlib.h");
});
pub fn main() void {
// strlen -- takes [*c]const u8, returns c_ulong
const msg: [:0]const u8 = "hello world";
const len = c.strlen(msg);
std.debug.print("strlen says: {d}\n", .{len});
// strcmp -- takes two [*c]const u8, returns c_int
const a: [:0]const u8 = "apple";
const b: [:0]const u8 = "banana";
const cmp = c.strcmp(a, b);
if (cmp < 0) {
std.debug.print("'{s}' comes before '{s}'\n", .{ a, b });
}
// getenv -- takes [*c]const u8, returns nullable [*:0]u8
const path_ptr = c.getenv("PATH");
if (path_ptr) |ptr| {
const path = std.mem.span(ptr);
std.debug.print("PATH has {d} characters\n", .{path.len});
} else {
std.debug.print("PATH not set\n", .{});
}
}
Notice the getenv return type. C's getenv returns char* which could be NULL (if the variable doesn't exist). Zig translates this as ?[*:0]u8 -- an optional sentinel many-pointer. The if (path_ptr) |ptr| unwraps the optional, giving you a non-null [*:0]u8 that you can then convert to a proper Zig slice with std.mem.span(). The type system forces you to handle the null case. In C, you'd just... hope you remembered to check.
The pattern for consuming C string return values is almost always the same: check for null, then std.mem.span() to get a usable Zig slice:
fn getEnvOrDefault(name: [:0]const u8, default: []const u8) []const u8 {
const ptr = c.getenv(name);
if (ptr) |p| {
return std.mem.span(p);
}
return default;
}
Building a C-compatible string utility
A common pattern in real Zig code is writing functions that need to work with both pure Zig callers (who have []const u8 slices) and C interop callers (who have [:0]const u8 or [*:0]const u8). Here's how to handle both:
const std = @import("std");
const testing = std.testing;
/// Counts occurrences of a byte in a sentinel-terminated string.
/// Accepts [:0]const u8 for C compatibility.
fn countByteZ(haystack: [:0]const u8, needle: u8) usize {
var count: usize = 0;
for (haystack) |byte| {
if (byte == needle) count += 1;
}
return count;
}
/// Counts occurrences of a byte in a regular Zig slice.
/// Accepts []const u8 for pure Zig callers.
fn countByte(haystack: []const u8, needle: u8) usize {
var count: usize = 0;
for (haystack) |byte| {
if (byte == needle) count += 1;
}
return count;
}
test "both versions work with string literals" {
// String literal coerces to either type
try testing.expectEqual(@as(usize, 3), countByteZ("hello world", 'l'));
try testing.expectEqual(@as(usize, 3), countByte("hello world", 'l'));
}
test "sentinel version works with C strings" {
const c_string: [*:0]const u8 = "hello world";
const slice = std.mem.span(c_string);
try testing.expectEqual(@as(usize, 3), countByteZ(slice, 'l'));
}
The difference: countByteZ takes [:0]const u8 which means callers can pass string literals and C strings directly (after std.mem.span()). countByte takes []const u8 which means callers can pass any slice, including subslices that aren't null-terminated. You could also write a single generic version using anytype, but having explicit overloads makes the API contract clearer -- the caller knows exactly what's expected.
Here's a more practical exmaple -- a function that duplicates a C string into Zig-managed memory:
const std = @import("std");
const testing = std.testing;
fn dupeZ(
allocator: std.mem.Allocator,
source: [*:0]const u8,
) ![:0]u8 {
const slice = std.mem.span(source);
const copy = try allocator.allocSentinel(u8, slice.len, 0);
@memcpy(copy, slice);
return copy;
}
test "dupeZ creates owned copy" {
const original: [*:0]const u8 = "hello";
const copy = try dupeZ(testing.allocator, original);
defer testing.allocator.free(copy);
try testing.expectEqualStrings("hello", copy);
try testing.expectEqual(@as(u8, 0), copy[copy.len]);
// It's a real copy -- different memory addresses
try testing.expect(copy.ptr != @as([*]const u8, original));
}
The standard library already has std.mem.dupeZ for this exact purpose (among other duplicating functions), but building it yourself shows the pattern: span() to get the length, allocSentinel to allocate with the terminator, @memcpy to copy the data. The returned [:0]u8 is owned by the caller and must be freed.
Sentinel-terminated arrays and compile-time guarantees
Beyond slices and pointers, Zig supports sentinel-terminated fixed-size arrays. These give you compile-time guarantees about the sentinel's presence:
const std = @import("std");
const testing = std.testing;
test "sentinel array basics" {
// Explicit sentinel array
const greeting: [5:0]u8 = .{ 'h', 'e', 'l', 'l', 'o' };
try testing.expectEqual(@as(usize, 5), greeting.len);
try testing.expectEqual(@as(u8, 0), greeting[5]); // sentinel
// The compiler ensures the sentinel is there at comptime
// This would be a compile error:
// const bad: [3:0]u8 = .{ 'a', 'b', 'c', 'd' }; // wrong length
// You can create sentinel arrays from regular arrays
var buf: [10]u8 = undefined;
@memcpy(buf[0..5], "hello");
buf[5] = 0;
const terminated: [:0]u8 = buf[0..5 :0];
try testing.expectEqualStrings("hello", terminated);
}
The compile-time guarantee is the important part. A [5:0]u8 is always 6 bytes with a zero at index 5. The compiler won't let you create one without the sentinel. This means when you pass a [N:0]u8 to a function, the function can trust the sentinel exists without runtime checking.
This matters for embedded programming and performance-critical code where you want compile-time verification that your string constants are properly terminated. Instead of hoping you remembered to add \0 at the end (like in C), the type system does it for you.
Splitting strings and working with sentinel data
One common operation is splitting a sentinel-terminated string. The standard library's std.mem.splitScalar works with regular slices, so you need to convert first:
const std = @import("std");
const testing = std.testing;
test "splitting a sentinel string" {
const path: [:0]const u8 = "/usr/local/bin:/usr/bin:/bin";
// [:0]const u8 coerces to []const u8 for split
var iter = std.mem.splitScalar(u8, path, ':');
var parts: [10][]const u8 = undefined;
var count: usize = 0;
while (iter.next()) |part| {
parts[count] = part;
count += 1;
}
try testing.expectEqual(@as(usize, 3), count);
try testing.expectEqualStrings("/usr/local/bin", parts[0]);
try testing.expectEqualStrings("/usr/bin", parts[1]);
try testing.expectEqualStrings("/bin", parts[2]);
}
The slices returned by splitScalar are regular []const u8 -- they don't carry sentinel guarantees because the split boundaries aren't necessarily at sentinel positions. If you need to pass one of these substrings to a C function, you'd need to allocate a sentinel-terminated copy:
const std = @import("std");
const testing = std.testing;
fn splitAndGetFirst(
allocator: std.mem.Allocator,
input: [:0]const u8,
delimiter: u8,
) ![:0]u8 {
var iter = std.mem.splitScalar(u8, input, delimiter);
const first = iter.next() orelse return error.Empty;
// Allocate a sentinel-terminated copy for C compat
const copy = try allocator.allocSentinel(u8, first.len, 0);
@memcpy(copy, first);
return copy;
}
test "split and get first as C-compatible string" {
const path: [:0]const u8 = "/usr/local/bin:/usr/bin:/bin";
const first = try splitAndGetFirst(testing.allocator, path, ':');
defer testing.allocator.free(first);
try testing.expectEqualStrings("/usr/local/bin", first);
try testing.expectEqual(@as(u8, 0), first[first.len]);
}
Common mistakes and how to avoid them
After years of C programming and a fair bit of Zig now, these are the sentinel-related mistakes I see most often:
Mistake 1: Forgetting the sentinel when allocating.
// WRONG: allocates 5 bytes, no room for sentinel
const bad = try allocator.alloc(u8, 5);
@memcpy(bad, "hello");
// bad is []u8, NOT [:0]u8 -- no sentinel!
// Passing this to a C function = undefined behaviour
// CORRECT: allocSentinel allocates 6 bytes, sets sentinel
const good = try allocator.allocSentinel(u8, 5, 0);
@memcpy(good, "hello");
// good is [:0]u8 -- sentinel guaranteed
Mistake 2: Off-by-one with sentinel length.
The .len of a sentinel slice does NOT include the sentinel byte. So "hello" has .len == 5 but occupies 6 bytes in memory. When calculating buffer sizes, you sometimes need len + 1:
// Copying a sentinel string to a fixed buffer:
var buf: [6]u8 = undefined; // NOT [5]u8 -- need room for sentinel!
const src: [:0]const u8 = "hello";
@memcpy(buf[0..5], src[0..5]);
buf[5] = 0;
Mistake 3: Passing a non-terminated slice to C.
const data: []const u8 = some_function(); // No sentinel guarantee!
// WRONG: data might not be null-terminated
// _ = c.strlen(@ptrCast(data.ptr)); // UB if no sentinel
// CORRECT: create a sentinel copy
const z = try allocator.allocSentinel(u8, data.len, 0);
defer allocator.free(z);
@memcpy(z, data);
_ = c.strlen(z);
The compiler actually helps here. If a function parameter is [*:0]const u8 or [:0]const u8, you can't pass a regular []const u8 -- the types are incompatible and the compiler rejects it. This is the whole point: the type system catches the bug at compile time before it becomes a runtime memory corruption.
Mistake 4: Assuming std.mem.span() is free.
span() has to scan forward byte-by-byte until it finds the sentinel. For a 1MB C string, that's 1 million comparisons. If you already know the length (from a previous strlen call or from context), pass it directly instead of rescanning:
// Scanning twice -- wasteful
const len = c.strlen(ptr);
const slice = std.mem.span(ptr);
// Better -- scan once, construct slice manually
const len = c.strlen(ptr);
const slice = ptr[0..len :0];
Exercises
Write a function
fn cstrlen(s: [*:0]const u8) usizethat computes the length of a null-terminated C string by scanning byte-by-byte, without usingstd.mem.span()orstd.mem.len()or any std library functions. Then write a second functionfn cstrcmp(a: [*:0]const u8, b: [*:0]const u8) i32that compares two C strings lexicographically (return negative if a < b, zero if equal, positive if a > b). Test both with various string pairs including empty strings.Build a
CStringBuilderstruct that lets you incrementally build a null-terminated string using an allocator. It should have:init(allocator) CStringBuilder,append(self, bytes: []const u8) !void,toOwnedSlice(self) ![:0]u8(returns the final null-terminated string, transfers ownership to the caller), anddeinit(self) void. Internally use anArrayList(u8). The key insight:toOwnedSliceneeds to ensure the final byte is the sentinel. Test by appending multiple string fragments, callingtoOwnedSlice, and verifying the result is properly null-terminated with the correct content.Use
@cImportand@cInclude("stdlib.h")to call C'sgetenv("PATH")from Zig. Convert the result to a Zig[:0]const u8usingstd.mem.span(). Then split the PATH by:(or;on Windows), iterate the parts, and print each directory on its own line. Handle the case where PATH is not set (null return). Build withexe.linkLibC()in yourbuild.zig(as we covered in @scipio/learn-zig-series-15-the-build-system-buildzig" target="_blank" rel="noopener noreferrer">ep015). The exercise ties together C interop, sentinel types, build system configuration, and standard library string operations.
Dusssssss, wat hebben we nou geleerd?
- Sentinel-terminated types encode the terminator value in the type system.
[:0]u8= slice guaranteed to end with a zero byte atslice[slice.len]. The compiler enforces this. - Four types in the sentinel family:
[N:0]u8(fixed array),[:0]u8(slice + sentinel),[*:0]u8(many-pointer + sentinel),[*c]u8(C pointer -- most permissive, can be null, no guarantees). - String literals are
*const [N:0]u8-- they coerce automatically to[:0]const u8,[]const u8,[*:0]const u8, and[*c]const u8. You can pass string literals directly to C functions. std.mem.span()converts a sentinel many-pointer[*:0]const u8to a sentinel slice[:0]const u8by scanning for the sentinel. This is your primary tool for consuming C string return values.allocator.allocSentinel(u8, len, 0)allocateslen + 1bytes and sets the last to zero. This is how you create heap-allocated null-terminated strings from regular Zig slices.- The type system prevents you from passing non-terminated slices to functions that expect sentinel-terminated data. This catches C interop bugs at compile time rather than as runtime memory corruption.
@cImporttranslates C'schar*to[*c]u8andconst char*to[*c]const u8. Functions returning nullable C strings become?[*:0]u8-- Zig forces you to handle the null case.
We've now got the full picture of how Zig talks to C at the string level. Next time we're going lower -- into packed structs and bit manipulation, where you control the exact memory layout of your data down to individual bits. If you've ever needed to parse a binary protocol or talk to hardware registers, that's the episode for you ;-)