How we did it
#1 A maintainable firmware foundation
The firmware runs bare-metal, with no RTOS beneath it — a deliberate trade. A kernel would have provided task scheduling and a pre-validated driver library, but it would also have cost the tight control over timing, memory layout, and power consumption that a battery-powered safety device lives or dies by. So every driver and peripheral abstraction was written from scratch against the silicon's low-level register interface, and the codebase was organised into three strictly separated layers. Drivers handle register-level access and nothing else. Modules compose those primitives into reusable, independently testable behaviours: the BLE stack, the IR sensor, the LED, the buzzer, the relay control line. The application layer, the only product-specific code in the system, holds the detection logic, the alarm state machine, the user interface, and the self-test orchestration. The payoff is long-term maintainability: a change at the driver level is absorbed without touching safety logic, and a change in the alarm logic never propagates down into driver code. Those boundaries aren't a convention anyone can quietly bypass; they are enforced by the build system and reviewed at every change.
#2 BLE power optimisation through protocol redesign
Battery life was a primary commercial KPI rather than a finishing touch. Customers don't tolerate frequent battery changes in a ceiling-mounted safety device. The first implementation followed the conventional pattern: the sensor and control unit held a persistent BLE connection and exchanged link-layer keepalives at every connection interval. That is simple and low-latency, but for a device that sends a meaningful payload only occasionally, most of the energy budget ends up spent on maintaining the link rather than on moving data. The redesign turned the asymmetry between the two devices into an advantage. The battery-powered sensor now stays disconnected by default, drawing near-zero radio energy, and starts advertising only when it actually has something to say like a status update, a heartbeat, or an alarm. The mains-powered control unit, which has energy to spare, scans continuously, picks up the advertisement, connects, pulls the message, and drops the link again. Connection intervals, slave latency, supervision timeouts, and advertising parameters were then tuned on top of that base.
#3 A safety algorithm proven against real data
The detection algorithm reads the heatmap from an 8×8 IR thermal array and classifies the scene as idle, normal cooking, or warning/alarm, carrying state across frames so it responds to sustained patterns rather than to a single transient spike. The algorithm itself was specified by the client's domain experts from years of field data on stove-fire behaviour; our work was to implement it in firmware, integrate it cleanly behind the IR module's interface, and validate it. Validation is where the real discipline sits, because staging genuine fire events under controlled conditions is demanding, costly, and not something that can be repeated at will. The scenarios were therefore captured once, in carefully instrumented test runs, and turned into a labelled measurement database — roughly a thousand normal-cooking sequences alongside a smaller set of real fire and warning events. A simulation harness then replays those recordings into the very same module interface the live sensor feeds, and checks each classification against its expected label. From that point on, regression-testing the classifier across hundreds of cases became a routine, automatable activity.
#4 Safety that fails loud, never silent
For a safety device, failing silently is worse than not being there at all: the user trusts that the unit on the ceiling is watching the stove, and a device that has quietly stopped working betrays that trust invisibly. The firmware is built so the guarantee is structural rather than assumed. At boot, a power-on self-test exercises every peripheral the device depends on the IR sensor, the radio, the battery monitor, the user-interface elements and the device enters its monitoring state only if all of them pass; a failure is reported through coded LED and buzzer signals that identify the subsystem at fault. Because the product is expected to run for years between battery changes, a lighter periodic self-test repeats the critical checks every few hours during normal operation, without disturbing the device's monitoring duty. Unrecoverable faults move the device into a defined, externally visible fault state rather than a silent reset or an undefined condition. A heartbeat contract requires the sensor to announce its presence at a fixed interval; if that heartbeat is missed beyond tolerance the control unit disconnects mains power to the stove. Continued operation has to be earned by continuous, verifiable proof that the safety chain is intact.
#5 Authenticated mobile control
The mobile app covers the conventional ground for a connected home device - commissioning, configuration, status, and alerts but the security boundary at the BLE interface is anything but routine for this product category. An arbitrary BLE-capable device in radio range cannot connect to the system, change its parameters, or interfere with its operation; only app instances that have been paired and authenticated are admitted, and everything else is rejected at the protocol layer. The Nordic BLE security stack is combined with a custom application-level authentication procedure, designed explicitly against the threats that matter most for a connected safety device: unauthorised interception, data tampering, and rogue-device impersonation. Data integrity and user safety were treated as primary design constraints from the first line of the protocol, not as features layered on once the product worked.