Click / to advance · N speaker notes · ? shortcuts
TECHNICAL WORKSHOP

Think & Speak AI
English Learning Platform

Technical Architecture Deep-Dive for Educators

Spring Boot · React · Azure Cognitive Services · Multi-LLM AI · Enterprise Security

MODULE 1

Enterprise-Grade AI Infrastructure

Built for Thousands of Concurrent Learners — The technical foundation powering the platform's core features

Cloud-Native Core

Spring Boot 3.3.10 on Java 21 — reactive programming for real-time educational interactions, WebSocket support for live classroom sessions.

  • Spring WebFlux for non-blocking request handling during peak HKDSE prep season
  • Flyway schema versioning ensures reliable database migrations
  • Redis session management with dynamic node allocation and round-robin load balancing
  • Heartbeat monitoring every 30 seconds with automatic failover within 100ms
🧠
Multi-LLM AI Engine

99.9% availability through intelligent fallback. Azure OpenAI for production quality, ZhiPuAI GLM-4-Flash for cultural context and backup.

  • Primary pool: Azure OpenAI endpoints with continuous health monitoring
  • Traffic reroutes to healthy nodes on failure detection
  • Tertiary: ZhiPuAI GLM-4-Flash as culturally-aware HK backup
  • Automatic provider switching within milliseconds — no classroom disruption
🛡️
Media Asset Security

AWS S3-compatible storage (DigitalSpaces). Presigned URLs with 60-minute expiry for learner data isolation and secure access.

  • DigitalSpaces integration for secure media storage
  • Temporary presigned URLs — no permanent public links to student recordings
  • Private learner data isolation from public content
  • FFmpeg-based format normalization (WebM → WAV, 16kHz PCM)
MODULE 2

Phoneme-Level Speech Analysis

Beyond Correct/Incorrect — Understanding how students speak, not just what they say

Azure Cognitive Services drives the pronunciation engine — breaking speech into individual phonemes and comparing against native speaker models.

🎤Audio
Capture
🔄Format
Normalize
🧠Azure
Engine
📊Phoneme
Analysis
🎯Feedback
Display
Three Assessment Modes

Detects specific sound issues like /θ/ vs /s/ confusion in "think" vs "sink." Uses IPA for universal sound representation — the same system used in professional PTE, IELTS, and DSE preparation.

For discrete vocabulary testing with HundredMark grading system integration. Quick pass/fail with granular score breakdown.

For longer passages or presentation scripts. Configurable with 3000ms end silence timeout for younger learners. Captures fluency and pacing over entire paragraphs.

Pedagogical Scoring System
Accuracy
60%
Fluency
25%
Completeness
15%
Hard Gate: Completeness below 40% caps overall score — prevents inflated assessments for word-skipping.
Example: "exact" → /ɪɡˈzækt/
/ɪ/ Short i ✓
/ɡ/ Hard g ✓
/ˈz/ Z sound ✓
/æ/ A-as-in-cat ⚠
/kt/ K-T ending ✓

Color-coded phoneme feedback — students see exactly which sounds need work

MODULE 3

Voice Signature Assessment

Understanding how students communicate, not just what they know — Psychometric profiling through AI speech analysis

Weighted ipsative scoring across 10 forced-choice ranking dimensions analyzes communication patterns and cognitive styles, creating a personalized learner profile.

Five Core Competencies
🎯
Adaptability (35-95%)
Rewards flexible thinking using multiple communication archetypes
🎨
Expressive Range (35-95%)
Measures vocabulary diversity and rhetorical flexibility
🔊
Fluency/Clarity (35-95%)
Evaluates speech flow and comprehensibility
⚖️
Calibration (35-95%)
Assesses self-awareness in communication contexts
🏔️
Resilience (35-95%)
Measures emotional steadiness — Mountain/Volcano patterns
Archetype Classification

Students categorized into five profiles using weighted voting:

Visionary
Strategic communicators
Performer
Expressive & dynamic
Nurturer
Supportive collaborators
Organizer
Systematic planners
Thinker
Analytical processors
Personalization Impact: Archetype results drive personalized AI tutor interactions, scenario selection in Conversation Role Play, and adaptive learning pathway generation.
Tone Mix Dimensions
Thud — Grounded Boing — Energetic Mom — Nurturing Pow — Assertive

Underused tones highlighted for targeted practice assignment

MODULE 4

Lexical Intelligence & Vocabulary Builder

From Live Translation captures to AI-generated flashcards — with video context and Cambridge Dictionary definitions

Three-Tier Word Enrichment
🌍
Tier 1: Cambridge Dictionary
IPA phonetics, part-of-speech, contextual examples, English↔Chinese Traditional
🧠
Tier 2: AI Contextual Definition
ZhiPuAI generates definitions when dictionary entries prove insufficient
🎬
Tier 3: Pexels Video Clips
Authentic contextual video — HD MP4 optimized for educational bandwidth
Content Safety: CambridgeDictionary.java maintains regex-based sensitivity filters ensuring age-appropriate vocabulary delivery for primary and secondary deployments.
Vocabulary Pipeline
📝Capture
Word
📖Dictionary
Lookup
🎬Video
Context
🃏Flashcard
Generate
🧠Spaced
Review
Mastery Classification
Active
Score ≤ 90%
Continued practice
Mastered
Score > 90%
Your Library

Binary system implements "desirable difficulties" — separating mastered vocabulary from active learning to prevent cognitive overload

MODULE 5

AI-Moderated Group Discussions

From classroom debates to remote study sessions — AI handles moderation while teachers monitor analytics

🏠
Same-Device Multi-Speaker

Voice biometrics (speaker_id differentiation) analyzes group discussions on shared devices. Each participant receives individual pronunciation scores and logic assessment.

  • Classroom settings with limited hardware — one tablet per group
  • Generates individual SpeakerScore and UtteranceSegment data
  • Teacher analytics show participation patterns per group member
🌐
Multi-Device AI Moderation

For remote learning, the system processes audio streams to determine optimal AI intervention points (join_at_seconds). AI generates contextual Socratic responses as audio streams.

  • Students receive guided questioning during debates — 24/7 availability
  • Hong Kong DSE speaking format directly supported
  • AI moderator runs without teacher supervision for homework practice
Live Participation Heatmap
● High participation ● Moderate ● Needs encouragement

Real-time analytics reveal which students dominate discussions versus those who need encouragement

Teacher Insight: "How many of you have students who never speak up in group work? The heatmap shows you exactly who's participating and who's quiet, so you can intervene."
MODULE 6

Audio Intelligence Pipeline

Handling real-world classroom conditions — noise, accented speech, and imperfect recordings

🔄
Format Normalization
AudioConvert.java — FFmpeg wrapper

Converts browser WebM recordings to Azure-compatible WAV (PCM 16-bit, 16kHz mono). Duration analysis via ffprobe supports HKDSE timing standards.

  • WebM streams without headers — specialized parsing
  • HKDSE timing standards compliance via ffprobe
  • Precision audio cutting isolates mispronounced segments
📡
Voice Activity Detection
SpeechVADFrameDetector.java

On-device VAD using RMS (energy) and ZCR (zero-crossing rate) analysis. Adaptive noise floor from first ~1 second, then exponential smoothing (α=0.02).

✂️
Segmentation
VadService.java

300ms pre/post padding preserves natural pauses. 400ms minimum segment threshold removes noise artifacts while keeping meaningful content.

Processing Pipeline
🎤 Browser records WebM via RecordRTC
🔄 FFmpeg converts to WAV (16kHz PCM)
📡 VAD detects speech vs silence
✂️ Segmentation trims silence
🧠 Azure Speech analysis
Real-World Classroom Handling
🌬️
Air conditioner
noise rejection
🪑
Chair scraping
ignored
🤧
Coughs &
interruptions
🔊
Corridor
noise
MODULE 7

Semantic Similarity Engine

Understanding student intent, not just exact words — Multi-algorithm ensemble for flexible, natural language matching

Multi-Algorithm Ensemble
Jaro-Winkler
15%
Levenshtein
15%
Cosine Similarity
35%
Monge-Elkan
35%

Weights favor semantic meaning and phrase structure over exact character matching — students who rearrange words or use synonyms still receive credit.

Pass Threshold: 0.6
60% semantic similarity counts as correct. Higher thresholds (0.8) applied for grammar-sensitive assessments.
Example Student Responses
Reference: "Yesterday I went to the store"
✓ "I went to the store yesterday" — PASS (word reorder)
Reference: "The cat sat on the mat"
✓ "On the mat, the cat sat" — PASS (syntactic variation)
Reference: "The weather is pleasant today"
✗ "I like pizza very much" — FAIL (different topic)
  • Lower-casing and punctuation removal
  • Whitespace normalization
  • Handles imperfect classroom transcription
  • Teacher speech in Note Capture correctly matches references
  • 0.6 = standard pass for general vocabulary exercises
  • 0.8 = higher bar for grammar-sensitive contexts
  • Adjustable per exercise type and difficulty level
  • Override capability for teacher-defined assessments
MODULE 8

Enterprise Security Infrastructure

Protecting student data and assessment integrity — PCI DSS compliant payment processing and enterprise authentication

🛡️
reCAPTCHA

Google reCAPTCHA v2/v3 for assessment integrity. Prevents bots from corrupting AI personalization data.

Authentic learners only
🔑
JWT Security

Spring Security with JWT tokens. Role-based permissions: students, teachers, administrators.

Spring Security
💳
Stripe Payments

3D Secure authentication for international cards. CustomerSession API. PCI DSS compliant.

PCI DSS
📊
Operational Monitoring

Microsoft Teams webhook alerts for real-time operational notifications. Theme-colored by severity.

Teams Integration
🏫
B2B School Support

Institutional billing with purchase orders. Bulk subscription management for school-wide deployments. Usage-based pricing for per-minute Live Translation. GDPR and COPPA compliance for educational settings with minors.

K-12 Ready
🔒
Data Privacy Controls

Role-based access ensures teachers see assigned students only. Admin views aggregate school data without accessing individual profiles. Parent consent required for under-13 accounts. Data export and deletion capabilities.

WCAG 2.1 AA
INTEGRATION

LMS Integration & Subscription Engine

Seamless integration with existing educational infrastructure — Thinkific LMS connectivity and flexible subscription management

📚
Thinkific LMS Integration

REST API integration with robust pagination (200 items/page) and automatic retry with exponential backoff for HTTP 429 rate limiting.

  • Pull course content and enrich with AI — pronunciation, vocabulary, real-time feedback
  • Structured lessons with Clearable criteria and ClearedLessonData tracking
  • Type-safe entity mapping for Users, Courses, and Enrollments
  • GraphQL integration planned for real-time content sync during Live Translation
🎯 Course Adventure: Guided learning routes with clear objectives, milestones, and achievements — "learning feels purposeful, motivating, and game-like."
💰
Stripe Subscription Engine

Complete lifecycle management with automatic billing, trial-to-paid conversion tracking, and configurable pricing models.

  • Trial subscriptions with configurable day lengths
  • Automatic payment capture and plan upgrades/downgrades
  • Auto-renewal toggle with payment failure handling
  • Customer metadata: from, thinkandspeak, active profile
Monetization Models
Free Trial Paid Subscriptions Per-Minute Live Translation Tiered DSE Prep AI Tutor Credits Bulk School Licenses
CONCLUSION

Think & Speak:
A Learning Operating System

From the Phoneme to the Paragraph

🎓
For Students

• AI Conversation Partner 24/7
• Phoneme-level pronunciation feedback
• Video flashcard vocabulary system
• Personalized learning pathways

👩‍🏫
For Teachers

• Live Translation classroom support
• Discussion heatmap analytics
• SBA preparation modules
• Real-time progress dashboards

🏫
For Schools

• Thinkific LMS integration
• Enterprise security (PCI, GDPR)
• Scalable 99.9% SLA infrastructure
• Bulk subscription management

Ready to Transform English Education?
Setup takes less than one hour · Free trial available · Full teacher training included

Think & Speak — See Change AI English · SeeChange Education · www.thinkandspeak.com