BeyondBinary | Nickolas Chua

Overview

BeyondBinary is an accessibility-first communication platform that bridges the gap between deaf, blind, deafblind, and mute users through real-time AI. It combines ASL sign language detection, speech-to-text, text-to-speech, emotional tone analysis, braille output, and peer-to-peer video calling into a single live workspace.

The Problem

Current assistive technologies focus on single-modality solutions—speech-to-text, basic sign recognition, or simple navigation aids—that don't address the complex, multi-layered needs of users with disabilities. These fragmented tools fail to account for regional sign language variations, contextual nuances, and the reality that many users need multiple modalities working together simultaneously.

BeyondBinary tackles this by combining vision, audio, text, haptics, and AI into one cohesive system that adapts to individual needs.

How It Works

Users select an accessibility profile during onboarding, which activates different input/output channels:

Deaf Profile

Receives: Large captions, sign interpretation, tone indicators
Sends: Text, message cards
Features: Tone emoji badges, visual-first layout

Blind Profile

Receives: Speech narration, braille output, tone identification
Sends: Text-to-speech
Features: Audio guidance, ElevenLabs voice

Deafblind Profile

Receives: Braille (always-on), optional audio, tone labels
Sends: Text-to-speech, message cards
Features: 12-cell braille display

Mute Profile

Receives: Captions, sign interpretation, audio context
Sends: Text-to-speech, text output
Features: Quick reply buttons

Technical Highlights

ASL Sign Detection

Real-time detection of 12 ASL signs via webcam at 5 FPS:

Pipeline: webcam frame → MediaPipe Holistic (1662 landmarks) → 30-frame sliding window → LSTM classifier → stability filter (5 consecutive frames) → confirmed sign
Signs: Hello, Thank You, Help, Yes, No, Please, Sorry, I Love You, Stop, More, How Are You, Good

Voice + Tone Intelligence

Speech-to-text: Groq Whisper (~200ms latency), OpenAI Whisper fallback
Emotional tone analysis: Hume AI prosody (~300ms), AFINN sentiment fallback
Text-to-speech: ElevenLabs multilingual v2, Web Speech API fallback
Intelligence: Groq Llama 3.3 70B for jargon simplification and quick reply generation

Braille Display

Visual 6-dot UEB Grade 1 braille cells rendered in the browser. 12-cell scrolling display converts conversation text to braille in real-time.

Video Calling

WebRTC peer-to-peer video with STUN/TURN relay. Signaling through backend WebSocket.

Tech Stack

Frontend: Next.js 16, React 19, TypeScript 5.9, Tailwind CSS 4
Backend: FastAPI 0.128, Python 3.12, Pydantic 2
ML Server: TensorFlow 2.16, MediaPipe 0.10.21, LSTM

External Services

Speech-to-text: Groq (Whisper Large v3 Turbo)
Text-to-speech: ElevenLabs (multilingual v2)
Tone analysis: Hume AI (Expression Measurement)
Intelligence: Groq (Llama 3.3 70B)