Project Overview
This project is a computer vision–driven interactive system developed for a restaurant client. It detects the presence of dishes and glasses on tables in real time and uses those events to trigger interactive digital content.
Rather than relying on touch interfaces or manual operation, the system responds directly to physical dining moments, allowing interactive content to unfold naturally and individually for each table.
- Project type: Client work / interactive system
- Intended audience: Restaurants and their guests
Goal & Intent
The project was initiated by a client request to automatically manage interactive table-based content without staff intervention, while providing a distinctive and memorable customer experience.
A central question guiding the work was how to build a system that is:
- Simple and robust enough for real restaurant operations
- Reliable under changing visual conditions, including projected video on tables
- Deliverable within tight deadlines and budget constraints
Beyond automation, the goal was to create personalized timing-based interactions, where content responds to when a dish or glass arrives at each table — rather than running synchronized effects across the entire space.
Technical / Architecture Description
System overview
Each table is monitored by a depth camera connected to a Raspberry Pi. Detection events are streamed via OSC to TouchDesigner, which manages interaction logic and triggers media playback.
Data flow
- RGB-D capture via OAK-D Lite
- Edge processing on Raspberry Pi (zone filtering, depth gating, noise suppression)
- OSC event transmission (per table namespace)
- Interaction logic in TouchDesigner
- Media triggering via Arena
Technologies
- Hardware: Raspberry Pi 5, OAK-D Lite
- Software: Python, DepthAI SDK, TouchDesigner, Arena
- Communication: OSC
- Configuration: YAML-based
GitHub
┌───────────────────────────┐
│ TABLE SPACE │
│ │
│ Dish / Glass Placement │
│ │
└─────────────┬─────────────┘
│
▼
┌───────────────────────────┐
│ OAK-D Lite Camera │
│ (RGB + Depth Capture) │
└─────────────┬─────────────┘
│ USB
▼
┌────────────────────────────────────────┐
│ Raspberry Pi 5 (Edge Node) │
│ │
│ [Python] │
│ - Zone-based calibration │
│ - Depth gate filtering (min/max) │
│ - Noise suppression (erosion) │
│ - Presence detection (ON / OFF) │
│ │
│ Output: Semantic events │
│ (/table1/dish_present, etc.) │
└─────────────┬──────────────────────────┘
│ OSC (UDP)
▼
┌────────────────────────────────────────┐
│ TouchDesigner (Windows PC) │
│ │
│ - Receives OSC from multiple Pis │
│ - Maps camera → table → logic │
│ - Timing, debounce, interaction flow │
│ │
│ Output: Trigger commands │
└─────────────┬──────────────────────────┘
│ OSC
▼
┌───────────────────────────┐
│ Arena Media │
│ (Playback Engine) │
│ │
│ - Plays table-specific │
│ visual content │
└───────────────────────────┘
Process
Research / Feasibility Testing
Initial testing focused on whether real-time object detection could remain stable in a visually complex environment. The emphasis was on feasibility, latency, and robustness rather than model sophistication.
Hardware Selection
The system was designed around edge computing to minimize latency and operational complexity.
- Raspberry Pi 5 was selected as the on-site processing unit
- OAK-D Lite depth cameras were chosen for their smaller form factor and lower cost, while still providing reliable RGB-D data
This combination balanced performance, footprint, and budget for real-world deployment.
Calibration & Tuning
A zone-based calibration workflow was introduced not only to support different table layouts, but to enable semantic interpretation using simple depth-based detection. By defining explicit spatial zones on the table surface, the system can infer which object belongs to which guest position, allowing depth-based presence detection to function as a lightweight form of semantic object detection. This approach also enables multiple guest interactions within a single camera view—for example, detecting dishes for two guests using one camera—significantly reducing hardware requirements without sacrificing reliability.
In addition, depth gate parameters (minimum and maximum height thresholds) are applied per zone to exclude the table surface itself. This prevents false detections caused by projected visuals or surface texture and allows the system to focus exclusively on objects placed above the table.
On-site Testing
The system was tested during actual restaurant operation to validate:
- Detection accuracy during service
- Responsiveness to dish placement and removal
- Stability under continuous use
Production Hardening
To ensure unattended operation:
- The system auto-starts on boot
- Remote access supports maintenance without on-site intervention
- Configuration files enable safe updates and scaling
Challenges & Learnings
Managing depth noise and false positives
Early versions of the system produced false detections due to depth noise, particularly around object edges and visually complex table surfaces. A key issue was invalid depth values (depth = 0) generated by stereo camera occlusion. These values were explicitly filtered out before detection to prevent false spikes, and morphological erosion was applied to remove small, isolated noise while preserving larger objects such as dishes and glasses.
Detection latency on object removal
Although detection was immediate, state reset lagged due to frame backlogs. By configuring camera queues to always serve the latest frame, the system became responsive to real-world changes.
Scaling across multiple cameras
Running multiple cameras across Raspberry Pis required both logical and hardware-level coordination. Each camera is explicitly mapped using its unique serial number (MxID) to a dedicated OSC namespace, ensuring consistent table-to-signal routing. To address USB bandwidth constraints on the Raspberry Pi, camera frame rates were intentionally limited (e.g. 15 FPS, configurable), allowing multiple cameras to operate reliably on a single device.
Key learning
In a controlled environment, simple depth-based detection can outperform heavier object detection models in reliability and maintainability. At the same time, the system architecture leaves room for future expansion using semantic object detection when richer interaction logic is required.
Output
Final system
- Two Raspberry Pi units running detection and calibration tools
- Six OAK-D Lite cameras monitoring individual tables
- Python-based calibration GUI and real-time detection service
- A Windows PC running TouchDesigner for interaction logic
- OSC-based triggering to Arena for media playback
The system separates sensing, interaction logic, and media control, making it modular and maintainable.
Audience experience
For guests, the interaction feels responsive, subtle, and reliable. Content reacts naturally to dining events without visible sensors or explicit input, and timing differs per guest — reinforcing a sense of personal interaction rather than spectacle.
Shibuya-ku, Sendagaya 3-30-9
CONTACT US