dish detection — sonostudio

WORK

SEEDS

Real-time Dish & Glass Detection for Interactive Restaurant Display

Project Overview

This project is a computer vision–driven interactive system developed for a restaurant client. It detects the presence of dishes and glasses on tables in real time and uses those events to trigger interactive digital content.

Rather than relying on touch interfaces or manual operation, the system responds directly to physical dining moments, allowing interactive content to unfold naturally and individually for each table.

Project type: Client work / interactive system
Intended audience: Restaurants and their guests

Goal & Intent

The project was initiated by a client request to automatically manage interactive table-based content without staff intervention, while providing a distinctive and memorable customer experience.

A central question guiding the work was how to build a system that is:

Simple and robust enough for real restaurant operations
Reliable under changing visual conditions, including projected video on tables
Deliverable within tight deadlines and budget constraints

Beyond automation, the goal was to create personalized timing-based interactions, where content responds to when a dish or glass arrives at each table — rather than running synchronized effects across the entire space.

Technical / Architecture Description

System overview

Each table is monitored by a depth camera connected to a Raspberry Pi. Detection events are streamed via OSC to TouchDesigner, which manages interaction logic and triggers media playback.

Data flow

RGB-D capture via OAK-D Lite
Edge processing on Raspberry Pi (zone filtering, depth gating, noise suppression)
OSC event transmission (per table namespace)
Interaction logic in TouchDesigner
Media triggering via Arena

Technologies

Hardware: Raspberry Pi 5, OAK-D Lite
Software: Python, DepthAI SDK, TouchDesigner, Arena
Communication: OSC
Configuration: YAML-based

GitHub

https://github.com/sonostudio/dish-detection

┌───────────────────────────┐
│        TABLE SPACE        │
│                           │
│  Dish / Glass Placement   │
│                           │
└─────────────┬─────────────┘
              │
              ▼
┌───────────────────────────┐
│      OAK-D Lite Camera    │
│  (RGB + Depth Capture)    │
└─────────────┬─────────────┘
              │  USB
              ▼
┌────────────────────────────────────────┐
│        Raspberry Pi 5 (Edge Node)      │
│                                        │
│  [Python]                              │
│  - Zone-based calibration              │
│  - Depth gate filtering (min/max)      │
│  - Noise suppression (erosion)         │
│  - Presence detection (ON / OFF)       │
│                                        │
│  Output: Semantic events               │
│  (/table1/dish_present, etc.)          │
└─────────────┬──────────────────────────┘
              │  OSC (UDP)
              ▼
┌────────────────────────────────────────┐
│      TouchDesigner (Windows PC)        │
│                                        │
│  - Receives OSC from multiple Pis      │
│  - Maps camera → table → logic         │
│  - Timing, debounce, interaction flow  │
│                                        │
│  Output: Trigger commands              │
└─────────────┬──────────────────────────┘
              │  OSC
              ▼
┌───────────────────────────┐
│        Arena Media        │
│      (Playback Engine)    │
│                           │
│  - Plays table-specific   │
│    visual content         │
└───────────────────────────┘

Process

Research / Feasibility Testing

Initial testing focused on whether real-time object detection could remain stable in a visually complex environment. The emphasis was on feasibility, latency, and robustness rather than model sophistication.

Hardware Selection

The system was designed around edge computing to minimize latency and operational complexity.

Raspberry Pi 5 was selected as the on-site processing unit
OAK-D Lite depth cameras were chosen for their smaller form factor and lower cost, while still providing reliable RGB-D data

This combination balanced performance, footprint, and budget for real-world deployment.

Calibration & Tuning

A zone-based calibration workflow was introduced not only to support different table layouts, but to enable semantic interpretation using simple depth-based detection. By defining explicit spatial zones on the table surface, the system can infer which object belongs to which guest position, allowing depth-based presence detection to function as a lightweight form of semantic object detection. This approach also enables multiple guest interactions within a single camera view—for example, detecting dishes for two guests using one camera—significantly reducing hardware requirements without sacrificing reliability.

In addition, depth gate parameters (minimum and maximum height thresholds) are applied per zone to exclude the table surface itself. This prevents false detections caused by projected visuals or surface texture and allows the system to focus exclusively on objects placed above the table.

On-site Testing

The system was tested during actual restaurant operation to validate:

Detection accuracy during service
Responsiveness to dish placement and removal
Stability under continuous use

Production Hardening

To ensure unattended operation:

The system auto-starts on boot
Remote access supports maintenance without on-site intervention
Configuration files enable safe updates and scaling

Challenges & Learnings

Managing depth noise and false positives

Early versions of the system produced false detections due to depth noise, particularly around object edges and visually complex table surfaces. A key issue was invalid depth values (depth = 0) generated by stereo camera occlusion. These values were explicitly filtered out before detection to prevent false spikes, and morphological erosion was applied to remove small, isolated noise while preserving larger objects such as dishes and glasses.

Detection latency on object removal

Although detection was immediate, state reset lagged due to frame backlogs. By configuring camera queues to always serve the latest frame, the system became responsive to real-world changes.

Scaling across multiple cameras

Running multiple cameras across Raspberry Pis required both logical and hardware-level coordination. Each camera is explicitly mapped using its unique serial number (MxID) to a dedicated OSC namespace, ensuring consistent table-to-signal routing. To address USB bandwidth constraints on the Raspberry Pi, camera frame rates were intentionally limited (e.g. 15 FPS, configurable), allowing multiple cameras to operate reliably on a single device.

Key learning

In a controlled environment, simple depth-based detection can outperform heavier object detection models in reliability and maintainability. At the same time, the system architecture leaves room for future expansion using semantic object detection when richer interaction logic is required.

Output

Final system

Two Raspberry Pi units running detection and calibration tools
Six OAK-D Lite cameras monitoring individual tables
Python-based calibration GUI and real-time detection service
A Windows PC running TouchDesigner for interaction logic
OSC-based triggering to Arena for media playback

The system separates sensing, interaction logic, and media control, making it modular and maintainable.

Audience experience

For guests, the interaction feels responsive, subtle, and reliable. Content reacts naturally to dining events without visible sensors or explicit input, and timing differs per guest — reinforcing a sense of personal interaction rather than spectacle.

en / jp

AEST

151-0051 Tokyo
Shibuya-ku, Sendagaya 3-30-9

Inquiries:
CONTACT US