Spring Boot 3.3.10 · Java 21 · Redis · Azure Speech Services · WebSocket Real-Time Communication
The platform runs on Spring Boot 3.3.10 on Java 21 with reactive programming paradigms for real-time educational interactions. This is not a startup prototype—it is production infrastructure designed for the reliability schools need. The automatic failover means class time is never wasted due to server issues.
spring-cache layer reduces database load during high-traffic exam preparation periods by caching frequently accessed learning paths.
The platform maintains a pool of Azure OpenAI endpoints. Continuous heartbeat monitoring detects node failures. When a node fails, traffic instantaneously reroutes to healthy instances. Only when all Azure nodes are unavailable does the system invoke the ZhiPuAI GLM-4-Flash model as backup.
This tiered approach prioritizes quality while ensuring continuity—schools need technology that works when students are taking mock exams. The system automatically switches AI providers within milliseconds if the primary service has issues.
Pronunciation → Spelling → Usage I → Usage II → Image Association · 10 Difficulty Levels · Spaced Repetition
Each task type maps to a specific pedagogical purpose. The progressive difficulty builds competence across multiple dimensions. Students must demonstrate proficiency before advancing.
Spelling: 35% — Orthographic precision indicates deep encoding critical for academic writing in HKDSE.
Pronunciation: 35% — Directly observable skill; primary barrier for Chinese-speaking students.
Usage & Image: 30% — Tests authentic application in real contexts.
The platform implements a lives system with per-task retry limits. Each retry deducts points, encouraging quality over quantity. Limited attempts create productive struggle.
Each level maps to authentic HKDSE difficulty expectations. The platform implements progressive difficulty aligned with Hong Kong's English Language curriculum. Levels escalate from 'First Words' (basic everyday vocabulary) through 'Simple Words,' 'Basic Words,' 'Learning Words,' 'Intermediate Words,' 'Advanced Words,' 'Expert Words,' 'Master Words,' 'Specialist Words,' to 'Extreme Words' (HKDSE-level academic vocabulary). Each level contains curated word lists specifically selected for that proficiency tier.
The Cambridge Dictionary integration provides authoritative definitions with IPA phonetic transcription and contextual examples. Students can filter captured vocabulary by specific Live Translation lessons, and the system maintains the connection between vocabulary and its originating content for contextual review—directly supporting the 'seamless content integration' objective.
Azure Speech Services · RecordRTC · 16kHz Sample Rate · WAV Lossless · Phoneme-by-Phoneme Scoring
The system uses RecordRTC library with StereoAudioRecorder for cross-browser compatibility. Recording parameters: sampleRate: 16000 (optimized for speech), mimeType: 'audio/wav' (lossless quality), maxDuration: 10 seconds.
The debounce protection (300ms) prevents rapid double-clicks that could cause state corruption. The checkMicrophone() function invokes browser permission dialog for microphone access. Recording starts and stops based on voice activity detection.
Format Normalization: The AudioConvert.java FFmpeg wrapper handles browser-based recording formats (WebM) and converts to Azure-compatible WAV (PCM 16-bit, 16kHz, mono). Duration analysis via ffprobe supports HKDSE timing standards. Precision audio cutting isolates specific mispronounced segments for focused practice.
Voice Activity Detection: The VADFrameDetector uses RMS (energy) and ZCR (zero-crossing rate) analysis. Adaptive noise floor bootstraps from first ~1 second, then uses exponential smoothing to adapt to changing classroom acoustics. This handles coughing, chair-scraping, and air-conditioning noise.
Segmentation: Transforms raw PCM streams into pedagogically useful segments with 300ms pre/post padding and 400ms minimum segment thresholds. Trimmed audio preserves natural pauses while removing dead air.
91-100: Excellent Perfect pronunciation—native-like accuracy
71-90: Great Minor deviations, highly intelligible
61-70: Good Try Noticeable errors, more practice needed
51-60: Not Bad Significant errors affect clarity
1-50: Needs Work Multiple phoneme errors detected
Gray: 0 — No speech detected
Hard Gate: Completeness below 40% caps overall score to prevent inflated assessments. Non-linear gamma compression normalizes score distributions for pedagogical accuracy over confidence metrics.
Word Extraction · Entity Recognition · Semantic Search · Auto-Flashcard Pipeline
The search system implements splitIgnoreCases to split query text while ignoring case. Results display with ±50 character context windows around matches. This is critical for HKDSE preparation—students need vocabulary in context, not isolated word lists.
The dayjs.createTime.valueOf() sorting ensures chronological descending order. Room-based message aggregation creates a 'learning journal'—every class discussion is searchable and reviewable.
The separateWords function implements the extraction engine:
words = text.split(/\s+/) → Split by whitespace
words.filter(w => /[^\p{L}]/gu.test(w) === false) → Remove punctuation
words.filter(w => /\d/.test(w) === false) → Remove numeric tokens
The Unicode-aware regex pattern /[\\p{L}]/gu preserves multilingual word boundaries. Students encounter English, Cantonese, and Mandarin within the same session. The regex filters English-only while respecting Chinese character boundaries. This is 'structured vocabulary capture' where every new word becomes a learning opportunity automatically tagged with time, topic, and lesson context.
The platform cross-references captured vocabulary against the student's persistent taggedWordListData. Known words receive green highlighting, signaling 'known territory.' New words appear in blue, representing the zone of proximal development.
The getWordEntity and hasWordEntity functions enable case-insensitive matching. Clicking recognized words triggers instant practice via LiveWordPracticeDialog.
The categorization system turns vocabulary lists into personalized learning collections:
Full CRUD taxonomy: Students create custom categories like 'Presentation Skills' or 'DSE Prep' and assign words accordingly. The slugifyTag function supports unlimited custom categories.
Dual-predicate filtering: Apply both category tags and full-text search simultaneously. The filteredWords computation enables real-time filtering without triggering full component re-renders.
Hybrid Listening-Speaking · Web Speech API · Silence Detection · Integrated Sentence-Dictation System
The SentencePickingPractice component combines listening and speaking in a single task. Students hear a sentence read aloud, select the correct option, then record their own pronunciation. This mirrors authentic language use where comprehension precedes production.
The system conditionally loads CompatibleSpeechRecognition (Web Speech API polyfill) when isEnableSpeechRecognition is true. The recognition object is configured with continuous mode, en-US language, and interimResults disabled. When speech is detected, the system starts silence-duration counting via startTimerDetectSilenceDurationCounting().
Four independent timer systems manage the learning flow:
1. Auto-Start Timer: 3-5 second countdown before recording begins, reducing anxiety and providing think-time.
2. Answer Timer: Enforces maximum recording duration, preventing indefinite recording.
3. Silence Detector: Stops recording when no speech detected for configured milliseconds—particularly valuable for classroom management, preventing students who are stuck or distracted from wasting time.
4. Auto-Progress Timer: Automatic advancement to next word after correct answer, maintaining learning momentum.
Each timer can be independently configured per learning scenario.
The platform implements an integrated sentence-dictation system linking spoken input, speech recognition, and automatic assessment. Students listen to model sentences and practice dictation in a structured sequence.
Trigger: After all words are pronounced, the system fetches task data via callGetSentencePickingPracticeUsage, presents multiple choice options, validates selection (highlighting correct/incorrect), then plays model audio before prompting recording.
The component implements three audio modes:
1. TTS Playback: Uses the useAudio hook from react-use for word pronunciation examples.
2. Student Recording: Uses MediaRecorder architecture identical to other components for consistency.
3. Auto-Play Sequencing: Implements timerPlayNextAudioRef for chain-playing word audio, definitions, and example sentences in the 'Audio Tour' experience.
This creates the 'hear → practice → compare' learning loop where students listen, record, then replay both audio tracks.
Points Economy · Achievement System · Mastery Classification · Activity Telemetry
The UnlockCategoryDialog implements a points-based progression system. Students earn points through practice, which unlock new vocabulary categories. The Progress component from Ant Design displays accumulated points with custom gradients.
The progress bar uses linear-gradient(90deg, #3695E5 0%, #90fc9b 51.92%, #ffcb51 98.08%) representing the journey from beginner (blue) to mastery (gold). The mascot character with thumbs-up creates emotional connection.
The binary mastery system separates Active Learning (score ≤ 90) from Archived Mastered (score > 90). This implements cognitive load management:
Students should not be overwhelmed by seeing 500 words every session. By separating mastered from active, students focus on what needs practice.
Every user interaction is tracked through reportActivityLog:
This telemetry feeds the AI Coach's personalization engine. The entity data captures categoryId, levelId, wordId, and vocabWord for granular analytics.
The vocabulary system connects to all platform modules:
The vocabulary system is the heart of the platform—it connects everything. When students use Live Translation in class, their captured words automatically appear here. Their pronunciation practice feeds back to the AI Coach, which adjusts future recommendations.
WebSocket · Redis Sessions · Bilingual Subtitles · Engagement Heatmap · Teacher Broadcast Tools
The platform's HybridEncryptor secures Azure access tokens before transmission to student devices. Tokens are embedded in presigned URLs with configurable expiry. The IP2Location service maintains monthly-updated IP prefix trees enabling O(log n) geolocation lookups.
The WebSocket architecture ensures sub-second latency. Redis-backed session management enables real-time transcription synchronization. Each classroom session gets CharRoom:{sessionId} storing transcription buffers, engagement signals, and broadcast messages.
The heatmap tracks three student signals:
✓ Understanding — Student confirms comprehension
❓ Question — Student signals confusion
○ Neutral — No explicit signal sent
The UserAuthorityEnum distinctions route messages: feedback channels for students aggregate into heatmap visualizations; broadcast channels for teachers enable announcements. Teachers can see at a glance which students need support.
The 'Daily Reading' feature uses Redis caching for instant delivery. The system pre-caches level-appropriate passages using REDIS_PREFIX_DAILY_SCRIPT constants and student proficiency tier as composite keys.
MyBatis-Plus SFunction references enable type-safe dynamic querying by lexical density and syntactic complexity. This ensures the 'personalized based on assessed level' promise scales to thousands of concurrent students.
The exam mode replicates HKDSE Paper 3 (Listening and Integrated Skills) conditions exactly:
pronunciationAssessmentForWordContinuous mode for reading passagesPhraseListGrammar for 'strong hints' based on expected texttimeOnTask, questionResponseTime, hesitationPatternsStudents practice under timed pressure so examination day feels familiar.
The Mailgun integration with bilingual HTML templates ensures students receive timely, branded communications:
Templates feature responsive HTML with Think & Speak brand identity. Green action buttons (#4CAF50) with clear expiration warnings. Personalized variables include student name, assigned AI tutor persona, and first milestone objective.
RBAC · Multi-Tenant Isolation · AES-256 Encryption · GDPR/FERPA · Stripe Payments
The hierarchical permission model supports the complex stakeholder ecosystem of Hong Kong schools: students, teachers, school administrators, and LMS-integrated users.
Each school operates as an independent tenant with strict data segregation. The schoolId filtering at database query level ensures School A's data never appears in School B's dashboard.
🔒 Each school's recordings, vocabulary, and analytics are completely isolated
The HybridEncryptor implementation uses:
AES-256-GCM — Provides confidentiality and integrity for voice recordings and student data. Voice recordings are encrypted at rest.
RSA-OAEP-SHA256 — Provides key wrapping and tamper-evidence for academic integrity.
Base64URL Encoding — Ensures URL-safe API transmission.
Compliance: FERPA (US student privacy), GDPR (data protection), COPPA (children's online privacy). The sourceOfReferral field is strictly optional—minimizing data collection for minors. Failed delivery attempts trigger fallback SMS paths for institutional accounts.
Azure OpenAI + ZhiPuAI Fallback · reCAPTCHA · Stripe Integration · Circuit Breakers · 99.9% Uptime
The system maintains a pool of Azure OpenAI endpoints. Continuous heartbeat monitoring detects failures. When a node fails, traffic reroutes to healthy instances. Only when all Azure nodes fail does ZhiPuAI GLM-4-Flash activate as backup.
The culturally-aware AI component handles Hong Kong Traditional Chinese orthography, HKDSE standards alignment, and Cantonese-influenced English pattern recognition.
The system categorizes all errors into four types:
1xx System Infrastructure errors — JWT expiration, network failures
2xx Business Logic errors — Enrollment status, payment issues
3xx Validation User input errors — Email validation, field formatting
4xx External Third-party errors — Stripe payments, speech recognition
Each error code enables specific AI Coach feedback. For example, SpeechRecognitionNoMatch suggests 'speak more clearly,' while SpeechAudioConvertFailed suggests 'check your microphone.' Same category prefix, unique subcodes enable targeted remediation.
The Stripe CustomerSession API handles secure payment processing. Features include:
B2B Support: Institutional billing with bulk subscription management. The StripeMetadata preserves customer context: { 'from': 'thinkandspeak', 'activeProfile': 'true' } tracks platform-specific metadata.
reCAPTCHA Integration: Google reCAPTCHA v2/v3 verification protects account creation, assessment onboarding, and subscription activation, ensuring authentic learners complete the personalized assessment.
Assessment → Practice → Feedback → Real-Time Classroom → AI Coach → Mastery