Back to projects

BeyondBinary

Accessibility-first communication platform bridging deaf, blind, deafblind, and mute users through real-time AI

Python · Next.js · TensorFlow · MediaPipe · WebRTC · FastAPI

Overview

BeyondBinary is an accessibility-first communication platform that bridges the gap between deaf, blind, deafblind, and mute users through real-time AI. It combines ASL sign language detection, speech-to-text, text-to-speech, emotional tone analysis, braille output, and peer-to-peer video calling into a single live workspace.

The Problem

Current assistive technologies focus on single-modality solutions—speech-to-text, basic sign recognition, or simple navigation aids—that don't address the complex, multi-layered needs of users with disabilities. These fragmented tools fail to account for regional sign language variations, contextual nuances, and the reality that many users need multiple modalities working together simultaneously.

BeyondBinary tackles this by combining vision, audio, text, haptics, and AI into one cohesive system that adapts to individual needs.

How It Works

Users select an accessibility profile during onboarding, which activates different input/output channels:

Deaf Profile

  • Receives: Large captions, sign interpretation, tone indicators
  • Sends: Text, message cards
  • Features: Tone emoji badges, visual-first layout

Blind Profile

  • Receives: Speech narration, braille output, tone identification
  • Sends: Text-to-speech
  • Features: Audio guidance, ElevenLabs voice

Deafblind Profile

  • Receives: Braille (always-on), optional audio, tone labels
  • Sends: Text-to-speech, message cards
  • Features: 12-cell braille display

Mute Profile

  • Receives: Captions, sign interpretation, audio context
  • Sends: Text-to-speech, text output
  • Features: Quick reply buttons

Technical Highlights

ASL Sign Detection

Real-time detection of 12 ASL signs via webcam at 5 FPS:

  • Pipeline: webcam frame → MediaPipe Holistic (1662 landmarks) → 30-frame sliding window → LSTM classifier → stability filter (5 consecutive frames) → confirmed sign
  • Signs: Hello, Thank You, Help, Yes, No, Please, Sorry, I Love You, Stop, More, How Are You, Good

Voice + Tone Intelligence

  • Speech-to-text: Groq Whisper (~200ms latency), OpenAI Whisper fallback
  • Emotional tone analysis: Hume AI prosody (~300ms), AFINN sentiment fallback
  • Text-to-speech: ElevenLabs multilingual v2, Web Speech API fallback
  • Intelligence: Groq Llama 3.3 70B for jargon simplification and quick reply generation

Braille Display

Visual 6-dot UEB Grade 1 braille cells rendered in the browser. 12-cell scrolling display converts conversation text to braille in real-time.

Video Calling

WebRTC peer-to-peer video with STUN/TURN relay. Signaling through backend WebSocket.

Tech Stack

  • Frontend: Next.js 16, React 19, TypeScript 5.9, Tailwind CSS 4
  • Backend: FastAPI 0.128, Python 3.12, Pydantic 2
  • ML Server: TensorFlow 2.16, MediaPipe 0.10.21, LSTM

External Services

  • Speech-to-text: Groq (Whisper Large v3 Turbo)
  • Text-to-speech: ElevenLabs (multilingual v2)
  • Tone analysis: Hume AI (Expression Measurement)
  • Intelligence: Groq (Llama 3.3 70B)