Why your middleware should never call the RTOS directly

There is a specific shape of technical debt that shows up in embedded SDKs roughly five years into a platform’s life.

The team wants to add support for a second RTOS — either because a major customer has standardised on it, or because the regulatory or certification picture has shifted, or because a new generation of silicon ships with a different default kernel.

The work begins, and then stalls. Not in the kernel layer, where the new RTOS slots in cleanly enough. In the middleware: the BLE host, the file system, the MQTT client, the Wi-Fi supplicant, every protocol stack the SDK has accumulated. Each of them has FreeRTOS task handles, semaphore APIs, and queue primitives sprinkled through the code, and porting them is not a refactor. It is a rewrite.

This is the cost of letting middleware call the RTOS directly. It looks free for the first three years of a platform’s life, and then it gets paid all at once.

The alternative is an OS Abstraction Layer (OSAL) — a thin, deliberately small layer that defines a canonical set of synchronisation and task primitives, and that middleware is required to use exclusively.

The OSAL is not a sophisticated piece of software. It is a discipline boundary. The reason it matters is that the discipline is impossible to retrofit at scale, and the cost of operating without it compounds in ways that don’t show up until the platform is already locked into the wrong answer.

This post walks through what direct RTOS coupling actually costs, what an OSAL is contractually responsible for, and the implementation pitfalls that determine whether the OSAL is genuinely portable or just appears to be.

The RTOS is not a library, it is a structural dependency

A real-time operating system is rarely an optional add-on in a modern embedded SDK. Protocol stacks, file systems, and communication clients all require task scheduling, mutual exclusion, and inter-task signaling primitives.

An SDK that does not define a clear RTOS integration strategy forces every developer to solve the same portability problem independently, producing fragmented application code that the SDK team did not write and cannot maintain.

The right framing is that the RTOS should be treated not as an external library that developers source themselves, but as a versioned, validated SDK component with a defined integration boundary. A bare-metal SDK may include the RTOS as an optional, selectable component enabled via a Kconfig symbol.

A connectivity SDK, where the BLE or Thread stack requires deterministic scheduling, may mandate a specific RTOS or a constrained subset of RTOS primitives. Either way, the SDK has to document the supported RTOS configurations explicitly, specify which RTOS version has been validated against each SDK release, and provide working reference configurations for each supported kernel.

That last requirement is the one that forces the OSAL question. If the SDK supports three RTOSes, and middleware is allowed to call any of them directly, then every middleware component has to be ported, tested, and maintained against three kernel APIs in parallel. The number of test combinations grows with the number of components multiplied by the number of kernels, and the maintenance cost of every kernel-specific code path compounds across the lifetime of the platform.

Four problems that direct RTOS calls cause

The case for an OSAL is best made by being concrete about what direct RTOS coupling actually costs. There are four recurring failure modes, and they are not equally visible.

Middleware portability collapses

The first and most obvious failure mode is that middleware components which call RTOS primitives directly become non-portable. A BLE stack written against the FreeRTOS API cannot be reused on a Zephyr-based product without a full rewrite. The component is not abstracted; it is welded to its kernel.

This is the failure mode that drives most OSAL adoption decisions, but it is usually noticed too late. By the time the team wants to port the BLE stack, the FreeRTOS-specific calls have proliferated through error paths, callback registration, and timer handling — places that were touched briefly during feature work and never revisited. The rewrite is not just translating API calls; it is re-validating every code path that depends on kernel-specific semantics.

ISR and task context boundaries get violated

Many RTOS primitives have separate ISR-safe variants — xSemaphoreGiveFromISR versus xSemaphoreGive, for instance — and calling the wrong variant from the wrong context produces undefined behavior that is difficult to reproduce and diagnose. When middleware calls the RTOS directly, the responsibility for picking the right variant lives wherever the call is written, which is everywhere.

An OSAL fixes this by exposing ISR-safe variants explicitly, and by documenting which HAL callbacks execute in interrupt context with a clear list of which OSAL primitives are safe to call within them. The boundary is enforced by the API surface rather than by convention. Middleware code that needs to signal a task from a DMA completion callback uses osal_sem_give_from_isr(), and that name is the documentation.

Heap and memory model mismatches go silent

RTOS kernels differ fundamentally in their memory allocation models, and these differences are not visible at the API call site. FreeRTOS offers five heap schemes with varying fragmentation and determinism characteristics. Zephyr defaults to fully static kernel object allocation with no dynamic heap in the default configuration, although configurable dynamic allocation is available. ThreadX uses block pools with its own allocation discipline.

An SDK that bundles middleware components with implicit heap dependencies may silently violate the memory model of a target RTOS, producing runtime failures that are not caught at compile time. A middleware component that allocates a queue dynamically at startup will work fine on FreeRTOS with heap_4 and fail silently on a Zephyr build with no dynamic allocation enabled — and the failure manifests not as a clean error, but as a queue-related fault several layers deeper in the stack.

The OSAL is where this gets standardised. Either the OSAL exposes only static-allocation primitives, in which case middleware is forced to declare its objects at compile time and the memory model is uniform across kernels, or the OSAL exposes both static and dynamic primitives with a clear contract about which configurations support which mode.

The version and configuration surface drifts

An RTOS is itself a versioned component with a non-trivial configuration. FreeRTOS is configured via FreeRTOSConfig.h. Zephyr via Kconfig. ThreadX via compile-time defines. A change to a single configuration parameter — tick rate, stack size, priority range — can break middleware behavior in ways that are not immediately obvious from the configuration diff.

An SDK that ships a validated RTOS configuration for each supported target, and that treats changes to that configuration with the same versioning discipline applied to public APIs, contains this problem. An SDK that lets each customer assemble their own RTOS configuration and then ships middleware that depends on specific values of those parameters, ends up with a support load that is impossible to triage.

What an OSAL actually contains

The OSAL maps a canonical set of primitives to the underlying RTOS. It is the contractual boundary that decouples middleware from the kernel, and it has to be defined before any middleware component is written — because once middleware exists with direct RTOS calls in it, the OSAL stops being an abstraction layer and starts being an aspirational one.

The minimum primitive set for a useful OSAL is small: mutex, semaphore (counting and binary), queue, software timer, and task. Each primitive has a defined creation function, a defined destruction function, and a small set of operations.

ISR-safe variants of the operations that need them are exposed explicitly. Timeouts on blocking operations are expressed in milliseconds, with a defined sentinel value for “wait forever.”
The OSAL header is portable; each RTOS ships a separate implementation file — osal_freertos.c, osal_zephyr.c, osal_threadx.c. Middleware links against the OSAL header only, never against an RTOS directly. This architecture allows a middleware component to be validated once and deployed across all supported RTOS targets without re-running the full test suite for each kernel — although in practice, an integration test pass on each target RTOS is still a release gate, because the OSAL implementations themselves are code that has to be tested.

The “wait forever” trap, and what it tells you about OSAL discipline

There is one specific implementation pitfall worth dwelling on, because it illustrates how subtle the failure modes can be even after an OSAL exists.

Most RTOS APIs accept a timeout parameter for blocking operations, and most of them define a sentinel value meaning “wait without timeout.” FreeRTOS uses portMAX_DELAY. Zephyr uses K_FOREVER. ThreadX uses TX_WAIT_FOREVER. These constants are not numerically equal, and they are not necessarily even of the same type — K_FOREVER is a struct in modern Zephyr, not an integer at all.

If the OSAL’s “wait forever” sentinel is just passed through to the underlying RTOS, the middleware code is portable. If the OSAL’s “wait forever” sentinel is, say, 0xFFFFFFFF and the FreeRTOS port maps it directly to portMAX_DELAY — but the Zephyr port treats it as a 4.29-billion-millisecond timeout because K_FOREVER is a different type — then the middleware works on FreeRTOS, returns from the wait approximately fifty days later on Zephyr, and looks identical at the source level.

This kind of bug is what OSAL conformance test suites exist to catch. Every OSAL implementation has to be tested against the full primitive set, including the edge cases — wait-forever, zero-timeout, timeout precisely equal to one tick, ISR variants called from task context (which should fail cleanly rather than corrupt state), and so on. The OSAL is small enough that the conformance suite is tractable to write and maintain. It is also small enough that it is tempting not to bother, which is how the wait-forever class of bug ends up shipping.

The discipline question

The technical content of an OSAL is simple. The discipline required to operate one is not. The rule has to be that middleware uses OSAL primitives exclusively, never RTOS APIs directly, and that this rule is enforced — by code review, by static analysis, or by build-system checks that fail when middleware code includes an RTOS header.

The reason this is hard is that RTOS APIs are right there. Every developer working on middleware knows the FreeRTOS or Zephyr API for the kernel they’re targeting, and reaching for it directly is faster than looking up the OSAL equivalent.

Once one direct call has been made and merged, the precedent is set, and the next direct call is easier to justify than the first. A year later, the OSAL is a layer that sits beside the middleware rather than underneath it, and the portability claim that the OSAL was supposed to provide is gone.

The discipline shows up in three places. The OSAL has to cover the actual needs of middleware, including the awkward edge cases — if the OSAL doesn’t expose what middleware needs, middleware will route around it. Middleware code review has to treat any RTOS header inclusion as a defect, with the same severity as a compile error.

And the SDK build system has to enforce the boundary mechanically, by failing the build when middleware code includes an RTOS-specific header. Mechanical enforcement is the only one of the three that survives organisational change without degrading.

What this means in practice

An SDK that enforces the OSAL boundary from the first release can support multiple RTOS targets with a single middleware codebase, significantly reducing the validation and maintenance cost of supporting a broad developer ecosystem.

It can also adopt a new RTOS — for a new silicon generation, a new certification target, or a new partner — by writing one OSAL implementation and validating it, rather than rewriting every middleware component.

An SDK that does not enforce it eventually requires a costly architectural refactor to achieve the same result. The refactor is hard because it touches every middleware component the platform has shipped, and because customer code may have pattern-matched the SDK’s own direct-RTOS-call style and adopted it in their applications.

Removing direct RTOS calls from middleware does not finish the job — the SDK has to either keep the RTOS APIs exposed for backward compatibility, or break customer code, and neither answer is good.

The OSAL is one of the highest-leverage architectural decisions in SDK design, and one of the most commonly under-specified. It is also one of the few that genuinely cannot be retrofitted without significant cost. If the platform is early enough that this decision is still open, it is worth the investment to make it deliberately. If the platform is later than that, the right time to start is now, with the smallest workable OSAL surface and a strict rule that all new middleware uses it.

An OSAL is one of the few SDK decisions that genuinely cannot be retrofitted without significant cost.

needCode designs embedded SDKs with OSAL boundaries enforced from the first release — FreeRTOS, Zephyr, and ThreadX supported from a single middleware codebase. If you’re architecting a new platform or facing the cost of adding a second RTOS to an existing one, let’s talk.

Book a free discovery call or get in touch

Further reading

FreeRTOS / Zephyr OS — the two RTOS targets whose portability problem this post’s OSAL pattern is designed to solve
Opaque Handles, Vtables, and Device Trees — the HAL abstraction layer that sits below the middleware layer described in this post
What Makes a Wireless SDK Different from an MCU SDK — the BLE and Thread stacks that are the most costly middleware components to rewrite when RTOS portability breaks
Monolithic, Meta-Tool, or Registry — the SDK delivery model that determines how OSAL implementations are packaged and versioned alongside middleware

Why your middleware should never call the RTOS directly

There is a specific shape of technical debt that shows up in embedded SDKs roughly five years into a platform’s life.

The RTOS is not a library, it is a structural dependency

Four problems that direct RTOS calls cause

Middleware portability collapses

ISR and task context boundaries get violated

Heap and memory model mismatches go silent

The version and configuration surface drifts

What an OSAL actually contains

The “wait forever” trap, and what it tells you about OSAL discipline

The discipline question

What this means in practice

Do you need Smart Innovations?

Let's work on your next project together

Why your middleware should never call the RTOS directly

There is a specific shape of technical debt that shows up in embedded SDKs roughly five years into a platform’s life.

The RTOS is not a library, it is a structural dependency

Four problems that direct RTOS calls cause

Middleware portability collapses

ISR and task context boundaries get violated

Heap and memory model mismatches go silent

The version and configuration surface drifts

What an OSAL actually contains

The “wait forever” trap, and what it tells you about OSAL discipline

The discipline question

What this means in practice

Do you need Smart Innovations?

Let's work on your next project together

Manufacturing

Logistics & supply chain

Retail

Agriculture

Smart Cities

Healthcare

Smart Homes

Maintenance (Post-Release Support)

Commercialization (From MVP to Product

Prototyping (From POC to MVP)

Design (From Idea to POC)