SECTION 01

Platform Architecture & Core Ecosystem

Spring Boot 3.3.10 · Java 21 · Redis · Azure Speech Services · WebSocket Real-Time Communication

🏗️ Enterprise Foundation

The platform runs on Spring Boot 3.3.10 on Java 21 with reactive programming paradigms for real-time educational interactions. This is not a startup prototype—it is production infrastructure designed for the reliability schools need. The automatic failover means class time is never wasted due to server issues.

WebSocket Support Flyway Schema Versioning Redis Session Management AWS S3 Compatible Storage

💡 "One Platform. One Learning System." The platform connects Thinkific LMS courses, AI pronunciation engines, and real-time classroom tools into a single ecosystem. The spring-cache layer reduces database load during high-traffic exam preparation periods by caching frequently accessed learning paths.

🤖 Multi-LLM Fallback Architecture

The platform maintains a pool of Azure OpenAI endpoints. Continuous heartbeat monitoring detects node failures. When a node fails, traffic instantaneously reroutes to healthy instances. Only when all Azure nodes are unavailable does the system invoke the ZhiPuAI GLM-4-Flash model as backup.

This tiered approach prioritizes quality while ensuring continuity—schools need technology that works when students are taking mock exams. The system automatically switches AI providers within milliseconds if the primary service has issues.

🌍 Culturally-Aware AI ZhiPuAI GLM-4-Flash serves as China's leading model with culturally-aware pedagogical understanding. It handles Hong Kong Traditional Chinese orthography verification (「裏」vs「裡」), alignment with local HKDSE examination standards, and specific syntactic challenge recognition for Chinese learners.

📝

Assessment
Profile Capture

→

📚

Vocabulary
5-Stage Pipeline

→

🎤

Speaking
Role Play & Practice

→

📖

Reading
Exam Preparation

→

🌐

Live Translation
Real-Time Bilingual

→

🤖

AI Coach
Personalized Feedback

SECTION 02

5-Stage Vocabulary Acquisition Pipeline

Pronunciation → Spelling → Usage I → Usage II → Image Association · 10 Difficulty Levels · Spaced Repetition

🎙️

Pronunciation
35%

→

✏️

Spelling
35%

→

📝

Usage I
20%

→

📖

Usage II
20%

→

🖼️

Image
25%

🎯 Mastery Progression

Each task type maps to a specific pedagogical purpose. The progressive difficulty builds competence across multiple dimensions. Students must demonstrate proficiency before advancing.

BeginnerMastery

📊 Weighting Philosophy

Spelling: 35% — Orthographic precision indicates deep encoding critical for academic writing in HKDSE.

Pronunciation: 35% — Directly observable skill; primary barrier for Chinese-speaking students.

Usage & Image: 30% — Tests authentic application in real contexts.

🎮 Gamified Retry System

The platform implements a lives system with per-task retry limits. Each retry deducts points, encouraging quality over quantity. Limited attempts create productive struggle.

Pronunciation: 5 attempts
Spelling: 3 attempts
Usage: 3 attempts
Image: 3 attempts

📈 10 Difficulty Levels: From 'First Words' to 'Extreme Words'

▼

Each level maps to authentic HKDSE difficulty expectations. The platform implements progressive difficulty aligned with Hong Kong's English Language curriculum. Levels escalate from 'First Words' (basic everyday vocabulary) through 'Simple Words,' 'Basic Words,' 'Learning Words,' 'Intermediate Words,' 'Advanced Words,' 'Expert Words,' 'Master Words,' 'Specialist Words,' to 'Extreme Words' (HKDSE-level academic vocabulary). Each level contains curated word lists specifically selected for that proficiency tier.

The Cambridge Dictionary integration provides authoritative definitions with IPA phonetic transcription and contextual examples. Students can filter captured vocabulary by specific Live Translation lessons, and the system maintains the connection between vocabulary and its originating content for contextual review—directly supporting the 'seamless content integration' objective.

SECTION 03

Phoneme-Level Pronunciation Analysis

Azure Speech Services · RecordRTC · 16kHz Sample Rate · WAV Lossless · Phoneme-by-Phoneme Scoring

🎙️ Audio Capture Pipeline

The system uses RecordRTC library with StereoAudioRecorder for cross-browser compatibility. Recording parameters: sampleRate: 16000 (optimized for speech), mimeType: 'audio/wav' (lossless quality), maxDuration: 10 seconds.

The debounce protection (300ms) prevents rapid double-clicks that could cause state corruption. The checkMicrophone() function invokes browser permission dialog for microphone access. Recording starts and stops based on voice activity detection.

🔊 Audio Technical Specifications

▼

Format Normalization: The AudioConvert.java FFmpeg wrapper handles browser-based recording formats (WebM) and converts to Azure-compatible WAV (PCM 16-bit, 16kHz, mono). Duration analysis via ffprobe supports HKDSE timing standards. Precision audio cutting isolates specific mispronounced segments for focused practice.

Voice Activity Detection: The VADFrameDetector uses RMS (energy) and ZCR (zero-crossing rate) analysis. Adaptive noise floor bootstraps from first ~1 second, then uses exponential smoothing to adapt to changing classroom acoustics. This handles coughing, chair-scraping, and air-conditioning noise.

Segmentation: Transforms raw PCM streams into pedagogically useful segments with 300ms pre/post padding and 400ms minimum segment thresholds. Trimmed audio preserves natural pauses while removing dead air.

🎯 Why 16kHz? Human speech frequencies peak at ~8kHz. Nyquist theorem requires 2× sampling rate (16kHz) for accurate speech capture. This balances clarity and file size—higher rates add bulk without perceptible quality improvement for phoneme analysis.

📊 4-Dimensional Scoring System

Accuracy: 85%Phoneme Precision

Fluency: 72%Rhythm & Prosody

Completeness: 91%Word Integrity

Pronunciation: 68%Overall Quality

🏷️ Score Thresholds & Color System

▼

91-100: Excellent Perfect pronunciation—native-like accuracy
71-90: Great Minor deviations, highly intelligible
61-70: Good Try Noticeable errors, more practice needed
51-60: Not Bad Significant errors affect clarity
1-50: Needs Work Multiple phoneme errors detected
Gray: 0 — No speech detected

Hard Gate: Completeness below 40% caps overall score to prevent inflated assessments. Non-linear gamma compression normalizes score distributions for pedagogical accuracy over confidence metrics.

SECTION 04

Session History & Vocabulary Management

Word Extraction · Entity Recognition · Semantic Search · Auto-Flashcard Pipeline

🔍 Semantic Search Engine

The search system implements splitIgnoreCases to split query text while ignoring case. Results display with ±50 character context windows around matches. This is critical for HKDSE preparation—students need vocabulary in context, not isolated word lists.

The dayjs.createTime.valueOf() sorting ensures chronological descending order. Room-based message aggregation creates a 'learning journal'—every class discussion is searchable and reviewable.

🔤 Word Extraction Pipeline

▼

The separateWords function implements the extraction engine:

words = text.split(/\s+/) → Split by whitespace
words.filter(w => /[^\p{L}]/gu.test(w) === false) → Remove punctuation
words.filter(w => /\d/.test(w) === false) → Remove numeric tokens

The Unicode-aware regex pattern /[\\p{L}]/gu preserves multilingual word boundaries. Students encounter English, Cantonese, and Mandarin within the same session. The regex filters English-only while respecting Chinese character boundaries. This is 'structured vocabulary capture' where every new word becomes a learning opportunity automatically tagged with time, topic, and lesson context.

✨ Entity Recognition System

The platform cross-references captured vocabulary against the student's persistent taggedWordListData. Known words receive green highlighting, signaling 'known territory.' New words appear in blue, representing the zone of proximal development.

The getWordEntity and hasWordEntity functions enable case-insensitive matching. Clicking recognized words triggers instant practice via LiveWordPracticeDialog.

🏷️ Categorization & Smart Filtering

▼

The categorization system turns vocabulary lists into personalized learning collections:

🔴 Not Practiced 🟡 In Progress 🟢 Mastered 📌 To Practice

Full CRUD taxonomy: Students create custom categories like 'Presentation Skills' or 'DSE Prep' and assign words accordingly. The slugifyTag function supports unlimited custom categories.

Dual-predicate filtering: Apply both category tags and full-text search simultaneously. The filteredWords computation enables real-time filtering without triggering full component re-renders.

SECTION 05

Sentence Picking Practice & Voice Assessment

Hybrid Listening-Speaking · Web Speech API · Silence Detection · Integrated Sentence-Dictation System

🎯 Hybrid Assessment Engine

The SentencePickingPractice component combines listening and speaking in a single task. Students hear a sentence read aloud, select the correct option, then record their own pronunciation. This mirrors authentic language use where comprehension precedes production.

The system conditionally loads CompatibleSpeechRecognition (Web Speech API polyfill) when isEnableSpeechRecognition is true. The recognition object is configured with continuous mode, en-US language, and interimResults disabled. When speech is detected, the system starts silence-duration counting via startTimerDetectSilenceDurationCounting().

⏱️ Four Concurrent Timer Systems

▼

Four independent timer systems manage the learning flow:

1. Auto-Start Timer: 3-5 second countdown before recording begins, reducing anxiety and providing think-time.

2. Answer Timer: Enforces maximum recording duration, preventing indefinite recording.

3. Silence Detector: Stops recording when no speech detected for configured milliseconds—particularly valuable for classroom management, preventing students who are stuck or distracted from wasting time.

4. Auto-Progress Timer: Automatic advancement to next word after correct answer, maintaining learning momentum.

Each timer can be independently configured per learning scenario.

📊 The Integrated Sentence-Dictation

The platform implements an integrated sentence-dictation system linking spoken input, speech recognition, and automatic assessment. Students listen to model sentences and practice dictation in a structured sequence.

Trigger: After all words are pronounced, the system fetches task data via callGetSentencePickingPracticeUsage, presents multiple choice options, validates selection (highlighting correct/incorrect), then plays model audio before prompting recording.

🔄 Multi-Modal Audio Architecture

▼

The component implements three audio modes:

1. TTS Playback: Uses the useAudio hook from react-use for word pronunciation examples.

2. Student Recording: Uses MediaRecorder architecture identical to other components for consistency.

3. Auto-Play Sequencing: Implements timerPlayNextAudioRef for chain-playing word audio, definitions, and example sentences in the 'Audio Tour' experience.

This creates the 'hear → practice → compare' learning loop where students listen, record, then replay both audio tracks.

SECTION 06

Gamification & Progress Tracking

Points Economy · Achievement System · Mastery Classification · Activity Telemetry

🏆 Points Economy

The UnlockCategoryDialog implements a points-based progression system. Students earn points through practice, which unlock new vocabulary categories. The Progress component from Ant Design displays accumulated points with custom gradients.

The progress bar uses linear-gradient(90deg, #3695E5 0%, #90fc9b 51.92%, #ffcb51 98.08%) representing the journey from beginner (blue) to mastery (gold). The mascot character with thumbs-up creates emotional connection.

📊 Mastery Classification

The binary mastery system separates Active Learning (score ≤ 90) from Archived Mastered (score > 90). This implements cognitive load management:

Active Learning
≤ 90% score · Not Passed

Mastered
> 90% score · Your Library

Students should not be overwhelmed by seeing 500 words every session. By separating mastered from active, students focus on what needs practice.

🎯 Activity Telemetry System

▼

Every user interaction is tracked through reportActivityLog:

WordList.filter.apply WordDetail.record.start WordDetail.record.stop WordDetail.audio.play WordDetail.audio.pause PhonemeBoard.expand Pagination.change Category.filter.select

This telemetry feeds the AI Coach's personalization engine. The entity data captures categoryId, levelId, wordId, and vocabWord for granular analytics.

🔗 Cross-Module Integration

▼

The vocabulary system connects to all platform modules:

🌐 Live Translation

→

📚 Word List

→

🏋️ Vocab Trainer

→

📊 AI Assessment

→

🎯 Progress Dashboard

→

🤖 AI Coach

The vocabulary system is the heart of the platform—it connects everything. When students use Live Translation in class, their captured words automatically appear here. Their pronunciation practice feeds back to the AI Coach, which adjusts future recommendations.

SECTION 07

Live Translation & Real-Time Classroom

WebSocket · Redis Sessions · Bilingual Subtitles · Engagement Heatmap · Teacher Broadcast Tools

🌐 Real-Time Translation Infrastructure

The platform's HybridEncryptor secures Azure access tokens before transmission to student devices. Tokens are embedded in presigned URLs with configurable expiry. The IP2Location service maintains monthly-updated IP prefix trees enabling O(log n) geolocation lookups.

The WebSocket architecture ensures sub-second latency. Redis-backed session management enables real-time transcription synchronization. Each classroom session gets CharRoom:{sessionId} storing transcription buffers, engagement signals, and broadcast messages.

⚡ Bandwidth Optimized for Hong Kong Classrooms Students often practice English during MTR commutes on limited data plans. Avatar compression targets 500KB. Daily reading content is pre-cached. Audio streams rather than downloads. These optimizations ensure learning continues regardless of network conditions.

📊 Engagement Heatmap System

▼

The heatmap tracks three student signals:

✓ Understanding — Student confirms comprehension

❓ Question — Student signals confusion

○ Neutral — No explicit signal sent

The UserAuthorityEnum distinctions route messages: feedback channels for students aggregate into heatmap visualizations; broadcast channels for teachers enable announcements. Teachers can see at a glance which students need support.

📚 Content Pipeline: From Speech to Mastery

The 'Daily Reading' feature uses Redis caching for instant delivery. The system pre-caches level-appropriate passages using REDIS_PREFIX_DAILY_SCRIPT constants and student proficiency tier as composite keys.

MyBatis-Plus SFunction references enable type-safe dynamic querying by lexical density and syntactic complexity. This ensures the 'personalized based on assessed level' promise scales to thousands of concurrent students.

⏱️ Exam Simulation Mode

▼

The exam mode replicates HKDSE Paper 3 (Listening and Integrated Skills) conditions exactly:

Configurable timing constraints matching official examination standards
pronunciationAssessmentForWordContinuous mode for reading passages
PhraseListGrammar for 'strong hints' based on expected text
British English (en-GB) or American English (en-US) accent configurations
Comprehensive telemetry: timeOnTask, questionResponseTime, hesitationPatterns

Students practice under timed pressure so examination day feels familiar.

🔔 Teacher Broadcast & Communication Tools

▼

The Mailgun integration with bilingual HTML templates ensures students receive timely, branded communications:

Welcome Email (EN/繁中) Verification Codes Magic Links Subscription Notifications

Templates feature responsive HTML with Think & Speak brand identity. Green action buttons (#4CAF50) with clear expiration warnings. Personalized variables include student name, assigned AI tutor persona, and first milestone objective.

SECTION 08

Security & Enterprise Readiness

RBAC · Multi-Tenant Isolation · AES-256 Encryption · GDPR/FERPA · Stripe Payments

🔐 Role-Based Access Control

The hierarchical permission model supports the complex stakeholder ecosystem of Hong Kong schools: students, teachers, school administrators, and LMS-integrated users.

👑 Admin (Full Access)

🏫 SchoolManager

👨‍🏫 TeacherManager

🎓 ThinkificUser

👤 CommonUser

👀 Readonly

🏫 Cumulative Permission Model Higher roles inherit capabilities from lower roles—SchoolManager also has TeacherManager permissions. When a Thinkific course instructor accesses the platform, they automatically get TeacherManager privileges mapped from Thinkific's 'course_admin' role.

🛡️ Multi-Tenant Data Isolation

Each school operates as an independent tenant with strict data segregation. The schoolId filtering at database query level ensures School A's data never appears in School B's dashboard.

School A
150 Students

School B
200 Students

School C
180 Students

🔒 Each school's recordings, vocabulary, and analytics are completely isolated

🔒 Encryption & Compliance

▼

The HybridEncryptor implementation uses:

AES-256-GCM — Provides confidentiality and integrity for voice recordings and student data. Voice recordings are encrypted at rest.

RSA-OAEP-SHA256 — Provides key wrapping and tamper-evidence for academic integrity.

Base64URL Encoding — Ensures URL-safe API transmission.

Compliance: FERPA (US student privacy), GDPR (data protection), COPPA (children's online privacy). The sourceOfReferral field is strictly optional—minimizing data collection for minors. Failed delivery attempts trigger fallback SMS paths for institutional accounts.

SECTION 09

API Resilience & Multi-LLM Architecture

Azure OpenAI + ZhiPuAI Fallback · reCAPTCHA · Stripe Integration · Circuit Breakers · 99.9% Uptime

🔄 Multi-LLM Fallback

The system maintains a pool of Azure OpenAI endpoints. Continuous heartbeat monitoring detects failures. When a node fails, traffic reroutes to healthy instances. Only when all Azure nodes fail does ZhiPuAI GLM-4-Flash activate as backup.

The culturally-aware AI component handles Hong Kong Traditional Chinese orthography, HKDSE standards alignment, and Cantonese-influenced English pattern recognition.

⚡ Error Taxonomy & AI Coach Guidance

▼

The system categorizes all errors into four types:

1xx System Infrastructure errors — JWT expiration, network failures

2xx Business Logic errors — Enrollment status, payment issues

3xx Validation User input errors — Email validation, field formatting

4xx External Third-party errors — Stripe payments, speech recognition

Each error code enables specific AI Coach feedback. For example, SpeechRecognitionNoMatch suggests 'speak more clearly,' while SpeechAudioConvertFailed suggests 'check your microphone.' Same category prefix, unique subcodes enable targeted remediation.

💳 Stripe Payment Integration

The Stripe CustomerSession API handles secure payment processing. Features include:

3D Secure Auth Purchase Orders Usage-Based Billing Per-Minute Pricing

B2B Support: Institutional billing with bulk subscription management. The StripeMetadata preserves customer context: { 'from': 'thinkandspeak', 'activeProfile': 'true' } tracks platform-specific metadata.

reCAPTCHA Integration: Google reCAPTCHA v2/v3 verification protects account creation, assessment onboarding, and subscription activation, ensuring authentic learners complete the personalized assessment.

🏫 Operational Monitoring Microsoft Teams webhook integration provides real-time alerts for classroom session monitoring, payment processing anomalies, and infrastructure health. Support teams see failures within seconds—scaling up before students notice. Pink alerts are informational, red alerts are urgent.

SECTION 10

The Complete Learning Ecosystem

Assessment → Practice → Feedback → Real-Time Classroom → AI Coach → Mastery

📝

Assessment
Profile Capture

→

📚

Vocabulary
5-Stage Pipeline

→

🎤

Pronunciation
Phoneme Analysis

→

📖

Reading
Exam Prep

→

🌐

Translation
Real-Time

→

🤖

AI Coach
Feedback

👨‍🏫 For Teachers

Live Translation eliminates classroom language barriers
Engagement heatmaps show student comprehension in real-time
AI Coach provides specific, actionable feedback
Broadcast tools for targeted content delivery

👩‍🎓 For Students

Phoneme-level pronunciation feedback (not just 'right/wrong')
Personalized vocabulary management across devices
Exam simulation with HKDSE timing standards
24/7 AI tutor availability for practice

🏫 For Schools

Enterprise security: AES-256, JWT, reCAPTCHA
99.9% uptime with multi-LLM fallback
GDPR/FERPA compliance built-in
Thinkific LMS integration—zero disruption

'Every word, analyzed sound by sound — turning every class into a reusable learning asset.'

🎯 Start Your Free Trial →

Teacher: test_teacher@seechange-edu.com · Password: Aa123456

Platform Architecture & Core Ecosystem

5-Stage Vocabulary Acquisition Pipeline

📈 10 Difficulty Levels: From 'First Words' to 'Extreme Words'

Phoneme-Level Pronunciation Analysis

🔊 Audio Technical Specifications

🏷️ Score Thresholds & Color System

Session History & Vocabulary Management

🔤 Word Extraction Pipeline

🏷️ Categorization & Smart Filtering

Sentence Picking Practice & Voice Assessment

⏱️ Four Concurrent Timer Systems

🔄 Multi-Modal Audio Architecture

Gamification & Progress Tracking

🎯 Activity Telemetry System

🔗 Cross-Module Integration

Live Translation & Real-Time Classroom

📊 Engagement Heatmap System

⏱️ Exam Simulation Mode

🔔 Teacher Broadcast & Communication Tools

Security & Enterprise Readiness

🔒 Encryption & Compliance

API Resilience & Multi-LLM Architecture

⚡ Error Taxonomy & AI Coach Guidance

The Complete Learning Ecosystem

💬 Speaker Notes