TECHNICAL WORKSHOP

Think & Speak AI
English Learning Platform

Technical Architecture Deep-Dive for Educators

Spring Boot · React · Azure Cognitive Services · Multi-LLM AI · Enterprise Security

MODULE 1

Enterprise-Grade AI Infrastructure

Built for Thousands of Concurrent Learners — The technical foundation powering the platform's core features

⚡

Cloud-Native Core

Spring Boot 3.3.10 on Java 21 — reactive programming for real-time educational interactions, WebSocket support for live classroom sessions.

Spring WebFlux for non-blocking request handling during peak HKDSE prep season
Flyway schema versioning ensures reliable database migrations
Redis session management with dynamic node allocation and round-robin load balancing
Heartbeat monitoring every 30 seconds with automatic failover within 100ms

🧠

Multi-LLM AI Engine

99.9% availability through intelligent fallback. Azure OpenAI for production quality, ZhiPuAI GLM-4-Flash for cultural context and backup.

Primary pool: Azure OpenAI endpoints with continuous health monitoring
Traffic reroutes to healthy nodes on failure detection
Tertiary: ZhiPuAI GLM-4-Flash as culturally-aware HK backup
Automatic provider switching within milliseconds — no classroom disruption

🛡️

Media Asset Security

AWS S3-compatible storage (DigitalSpaces). Presigned URLs with 60-minute expiry for learner data isolation and secure access.

DigitalSpaces integration for secure media storage
Temporary presigned URLs — no permanent public links to student recordings
Private learner data isolation from public content
FFmpeg-based format normalization (WebM → WAV, 16kHz PCM)

MODULE 2

Phoneme-Level Speech Analysis

Beyond Correct/Incorrect — Understanding how students speak, not just what they say

Azure Cognitive Services drives the pronunciation engine — breaking speech into individual phonemes and comparing against native speaker models.

🎤Audio
Capture

→

🔄Format
Normalize

→

🧠Azure
Engine

→

📊Phoneme
Analysis

→

🎯Feedback
Display

Three Assessment Modes

Detects specific sound issues like /θ/ vs /s/ confusion in "think" vs "sink." Uses IPA for universal sound representation — the same system used in professional PTE, IELTS, and DSE preparation.

For discrete vocabulary testing with HundredMark grading system integration. Quick pass/fail with granular score breakdown.

For longer passages or presentation scripts. Configurable with 3000ms end silence timeout for younger learners. Captures fluency and pacing over entire paragraphs.

Pedagogical Scoring System

Accuracy

60%

Fluency

25%

Completeness

15%

Hard Gate: Completeness below 40% caps overall score — prevents inflated assessments for word-skipping.

Example: "exact" → /ɪɡˈzækt/

/ɪ/ Short i ✓

/ɡ/ Hard g ✓

/ˈz/ Z sound ✓

/æ/ A-as-in-cat ⚠

/kt/ K-T ending ✓

Color-coded phoneme feedback — students see exactly which sounds need work

MODULE 3

Voice Signature Assessment

Understanding how students communicate, not just what they know — Psychometric profiling through AI speech analysis

Weighted ipsative scoring across 10 forced-choice ranking dimensions analyzes communication patterns and cognitive styles, creating a personalized learner profile.

Five Core Competencies

🎯

Adaptability (35-95%)

Rewards flexible thinking using multiple communication archetypes

🎨

Expressive Range (35-95%)

Measures vocabulary diversity and rhetorical flexibility

🔊

Fluency/Clarity (35-95%)

Evaluates speech flow and comprehensibility

⚖️

Calibration (35-95%)

Assesses self-awareness in communication contexts

🏔️

Resilience (35-95%)

Measures emotional steadiness — Mountain/Volcano patterns

Archetype Classification

Students categorized into five profiles using weighted voting:

Visionary
Strategic communicators

Performer
Expressive & dynamic

Nurturer
Supportive collaborators

Organizer
Systematic planners

Thinker
Analytical processors

Personalization Impact: Archetype results drive personalized AI tutor interactions, scenario selection in Conversation Role Play, and adaptive learning pathway generation.

Tone Mix Dimensions

Thud — Grounded Boing — Energetic Mom — Nurturing Pow — Assertive

Underused tones highlighted for targeted practice assignment

MODULE 4

Lexical Intelligence & Vocabulary Builder

From Live Translation captures to AI-generated flashcards — with video context and Cambridge Dictionary definitions

Three-Tier Word Enrichment

🌍

Tier 1: Cambridge Dictionary
IPA phonetics, part-of-speech, contextual examples, English↔Chinese Traditional

↓

🧠

Tier 2: AI Contextual Definition
ZhiPuAI generates definitions when dictionary entries prove insufficient

↓

🎬

Tier 3: Pexels Video Clips
Authentic contextual video — HD MP4 optimized for educational bandwidth

Content Safety: CambridgeDictionary.java maintains regex-based sensitivity filters ensuring age-appropriate vocabulary delivery for primary and secondary deployments.

Vocabulary Pipeline

📝Capture
Word

→

📖Dictionary
Lookup

→

🎬Video
Context

→

🃏Flashcard
Generate

→

🧠Spaced
Review

Mastery Classification

Active

Score ≤ 90%
Continued practice

Mastered

Score > 90%
Your Library

Binary system implements "desirable difficulties" — separating mastered vocabulary from active learning to prevent cognitive overload

MODULE 5

AI-Moderated Group Discussions

From classroom debates to remote study sessions — AI handles moderation while teachers monitor analytics

🏠

Same-Device Multi-Speaker

Voice biometrics (speaker_id differentiation) analyzes group discussions on shared devices. Each participant receives individual pronunciation scores and logic assessment.

Classroom settings with limited hardware — one tablet per group
Generates individual SpeakerScore and UtteranceSegment data
Teacher analytics show participation patterns per group member

🌐

Multi-Device AI Moderation

For remote learning, the system processes audio streams to determine optimal AI intervention points (join_at_seconds). AI generates contextual Socratic responses as audio streams.

Students receive guided questioning during debates — 24/7 availability
Hong Kong DSE speaking format directly supported
AI moderator runs without teacher supervision for homework practice

Live Participation Heatmap

● High participation ● Moderate ● Needs encouragement

Real-time analytics reveal which students dominate discussions versus those who need encouragement

Teacher Insight: "How many of you have students who never speak up in group work? The heatmap shows you exactly who's participating and who's quiet, so you can intervene."

MODULE 6

Audio Intelligence Pipeline

Handling real-world classroom conditions — noise, accented speech, and imperfect recordings

🔄

Format Normalization

AudioConvert.java — FFmpeg wrapper

Converts browser WebM recordings to Azure-compatible WAV (PCM 16-bit, 16kHz mono). Duration analysis via ffprobe supports HKDSE timing standards.

WebM streams without headers — specialized parsing
HKDSE timing standards compliance via ffprobe
Precision audio cutting isolates mispronounced segments

📡

Voice Activity Detection

SpeechVADFrameDetector.java

On-device VAD using RMS (energy) and ZCR (zero-crossing rate) analysis. Adaptive noise floor from first ~1 second, then exponential smoothing (α=0.02).

✂️

Segmentation

VadService.java

300ms pre/post padding preserves natural pauses. 400ms minimum segment threshold removes noise artifacts while keeping meaningful content.

Processing Pipeline

🎤 Browser records WebM via RecordRTC

↓

🔄 FFmpeg converts to WAV (16kHz PCM)

↓

📡 VAD detects speech vs silence

↓

✂️ Segmentation trims silence

↓

🧠 Azure Speech analysis

Real-World Classroom Handling

🌬️

Air conditioner
noise rejection

🪑

Chair scraping
ignored

🤧

Coughs &
interruptions

🔊

Corridor
noise

MODULE 7

Semantic Similarity Engine

Understanding student intent, not just exact words — Multi-algorithm ensemble for flexible, natural language matching

Multi-Algorithm Ensemble

Jaro-Winkler

15%

Levenshtein

15%

Cosine Similarity

35%

Monge-Elkan

35%

Weights favor semantic meaning and phrase structure over exact character matching — students who rearrange words or use synonyms still receive credit.

Pass Threshold: 0.6

60% semantic similarity counts as correct. Higher thresholds (0.8) applied for grammar-sensitive assessments.

Example Student Responses

Reference: "Yesterday I went to the store"

✓ "I went to the store yesterday" — PASS (word reorder)

Reference: "The cat sat on the mat"

✓ "On the mat, the cat sat" — PASS (syntactic variation)

Reference: "The weather is pleasant today"

✗ "I like pizza very much" — FAIL (different topic)

Lower-casing and punctuation removal
Whitespace normalization
Handles imperfect classroom transcription
Teacher speech in Note Capture correctly matches references

0.6 = standard pass for general vocabulary exercises
0.8 = higher bar for grammar-sensitive contexts
Adjustable per exercise type and difficulty level
Override capability for teacher-defined assessments

MODULE 8

Enterprise Security Infrastructure

Protecting student data and assessment integrity — PCI DSS compliant payment processing and enterprise authentication

🛡️

reCAPTCHA

Google reCAPTCHA v2/v3 for assessment integrity. Prevents bots from corrupting AI personalization data.

Authentic learners only

🔑

JWT Security

Spring Security with JWT tokens. Role-based permissions: students, teachers, administrators.

Spring Security

💳

Stripe Payments

3D Secure authentication for international cards. CustomerSession API. PCI DSS compliant.

PCI DSS

📊

Operational Monitoring

Microsoft Teams webhook alerts for real-time operational notifications. Theme-colored by severity.

Teams Integration

🏫

B2B School Support

Institutional billing with purchase orders. Bulk subscription management for school-wide deployments. Usage-based pricing for per-minute Live Translation. GDPR and COPPA compliance for educational settings with minors.

K-12 Ready

🔒

Data Privacy Controls

Role-based access ensures teachers see assigned students only. Admin views aggregate school data without accessing individual profiles. Parent consent required for under-13 accounts. Data export and deletion capabilities.

WCAG 2.1 AA

INTEGRATION

LMS Integration & Subscription Engine

Seamless integration with existing educational infrastructure — Thinkific LMS connectivity and flexible subscription management

📚

Thinkific LMS Integration

REST API integration with robust pagination (200 items/page) and automatic retry with exponential backoff for HTTP 429 rate limiting.

Pull course content and enrich with AI — pronunciation, vocabulary, real-time feedback
Structured lessons with Clearable criteria and ClearedLessonData tracking
Type-safe entity mapping for Users, Courses, and Enrollments
GraphQL integration planned for real-time content sync during Live Translation

🎯 Course Adventure: Guided learning routes with clear objectives, milestones, and achievements — "learning feels purposeful, motivating, and game-like."

💰

Stripe Subscription Engine

Complete lifecycle management with automatic billing, trial-to-paid conversion tracking, and configurable pricing models.

Trial subscriptions with configurable day lengths
Automatic payment capture and plan upgrades/downgrades
Auto-renewal toggle with payment failure handling
Customer metadata: from, thinkandspeak, active profile

Monetization Models

Free Trial Paid Subscriptions Per-Minute Live Translation Tiered DSE Prep AI Tutor Credits Bulk School Licenses

CONCLUSION

Think & Speak:
A Learning Operating System

From the Phoneme to the Paragraph

🎓

For Students

• AI Conversation Partner 24/7
• Phoneme-level pronunciation feedback
• Video flashcard vocabulary system
• Personalized learning pathways

👩‍🏫

For Teachers

• Live Translation classroom support
• Discussion heatmap analytics
• SBA preparation modules
• Real-time progress dashboards

🏫

For Schools

• Thinkific LMS integration
• Enterprise security (PCI, GDPR)
• Scalable 99.9% SLA infrastructure
• Bulk subscription management

Ready to Transform English Education?

Setup takes less than one hour · Free trial available · Full teacher training included

Think & Speak — See Change AI English · SeeChange Education · www.thinkandspeak.com

Think & Speak AIEnglish Learning Platform

Enterprise-Grade AI Infrastructure

Phoneme-Level Speech Analysis

Voice Signature Assessment

Lexical Intelligence & Vocabulary Builder

AI-Moderated Group Discussions

Audio Intelligence Pipeline

Semantic Similarity Engine

Enterprise Security Infrastructure

LMS Integration & Subscription Engine

Think & Speak:A Learning Operating System

📝 Speaker Notes

Think & Speak AI
English Learning Platform

Think & Speak:
A Learning Operating System