Section 1
Platform Architecture Overview
Dual-Product Ecosystem
Think & Speak is a comprehensive language learning ecosystem comprising two main product lines that work together to create a complete learning experience for Hong Kong classrooms.
- AI English — Personalized learning with assessment-driven content, Course Adventure paths, vocabulary mastery, and conversation practice
- Live Translation — Real-time multilingual classroom support bridging English ↔ Mandarin ↔ Cantonese during live sessions
- Both product lines share core infrastructure: WebSocket communication, Azure Speech Services, Redux state management
- Separate Student and Teacher interfaces optimized for distinct roles and workflows
Technology Stack
React / Next.js
TypeScript
Redux
Ant Design
SCSS Modules
Azure Speech SDK
WebSocket
date-fns
- Frontend: React/Next.js with TypeScript for type-safe component development
- State: Redux for global state + local useState for component UI
- AI/ML: Azure Cognitive Services for real-time speech recognition & translation
- Communication: WebSocket for real-time classroom sync with 10-second polling
- UI: Ant Design component library with custom SCSS module overrides
Security & Data Handling
- End-to-end encryption for all WebSocket data transmission
- Azure-certified cloud infrastructure (ISO 27001, SOC 2)
- GDPR-compliant data handling with school data segregation
- Role-based access control: Teacher (admin) vs Student (personal data only)
- Optional PII fields only — minimizes data collection for K12 students
Component Architecture
The platform uses a modular component architecture separating pedagogical content delivery from real-time classroom management.
Student Interface
ChatBox, MyNotebook, Flashcards, Dashboard
Teacher Interface
RoomControl, Attendance, Heatmap, Export
Trilingual Support
🇬🇧
English
en-US source + target
🇨🇳
Mandarin
zh-CN Simplified Chinese
🇭🇰
Cantonese
zh-HK Traditional Chinese
Auto-routing: English → Cantonese default; Chinese inputs → English
Learning Pipeline Overview
Listen
→
Capture
→
Review
→
Master
Every classroom utterance becomes a potential learning asset through this automated pipeline
Think & Speak™Section 1 of 13
Section 2
Real-Time Classroom Infrastructure
ChatBox Component
- Dimensional Constraints: 64px fixed header, 96px fixed footer, flexible main content area
- Theme Adaptability: CSS variables
--chat-box-bgc and --chat-box-white-bgc
- Responsive: Border radius 32px (desktop) → 0px (mobile) for full-screen immersive mode
- Dual Mode:
data-show-result='true' attribute selector for result view
- Flexbox Layout: Header (6%) + Main Content (88%) + Footer (6%) height distribution
- Footer contains input field, recording button, and action controls
Live Translation Engine
- WebSocket communication for real-time subtitle delivery across connected browsers
- Azure Speech Translation Recognizer: en-US source → zh-CN + zh-HK targets
- Intermediate ('recognizing') results enable real-time subtitle streaming
- Final ('recognized') results populate session history for vocabulary extraction
- Message objects contain: originalText, translations, timestamp, senderId
- StickyMessage interface for pinning critical teacher announcements
- Exponential backoff reconnection (max 3 attempts, 10s delays)
Room Management System
- RoomCard: Visual identity via deterministic theme hashing (roomCardBlue, roomCardYellow, roomCardPink)
- Status Rendering: roomCardInactive class applies opacity/grayscale for completed sessions
- Avatar Overlap: Up to 6 visible avatars with 20% overlap, "+N more" indicator for overflow
- Lifecycle: RoomStatus state machine: WAITING → STARTED → IN_PROGRESS → COMPLETED
- Auto-transition: onlineStudentCount > 0 threshold triggers IN_PROGRESS
Session Features
- Bilingual subtitle support across English, Mandarin, Cantonese
- Participant avatar stacking with visual density indicators
- Room creation/archival workflow with admin controls
- Text selection intelligence: excludes timestamps, headers, translation metadata
- Single-word → vocabulary; Multi-word (≥5) → scripts via 400ms debounce
- Session export for post-class learning asset creation
Room Lifecycle
1
CREATE
Room + Student list
3
MANAGE
Real-time translation
Think & Speak™Section 2 of 13
Section 3
The Intelligent Note-Taking Ecosystem
Data Transformation Layer
The transformNoteListToMyNoteBookDialogData function serves as an intelligent categorization engine:
- Parses comma-separated labels from backend, maps standardized tags to icon variants via
labelToVariant mapping
- Supports custom categories via
slugifyTag — prefix "custom-" for user-created tags
- Calculates per-category counts for display with "new" badges on recent items
- memoized for performance — only re-computes when dependencies change
- NoteItem component: text, remark (max 250 chars), label, comment[] array
UI/UX Design Patterns
- Grid Navigation: Category cards with color-coded left borders in responsive grid
- Real-time Search: Full-text filtering across note content and categories
- Tour Integration: isTourMode flag enables mockNoteData injection during guided onboarding
- Batch Operations: Multi-select with confirmation modals for deletion
- Smart Fallbacks: Graceful handling when note data is missing or corrupt
- Dual-Mode Editing: Review mode (static tags) vs Capture mode (checkbox multi-label)
250-Character Constraint: Mirrors Cornell Notes methodology — encourages concise summarization rather than verbatim transcription. Develops metacognitive awareness through personal reflection.
Nine-Category Taxonomy
Each category has a unique color and icon for instant visual recognition. Students can tag notes with multiple categories for cross-referencing.
Lecture
Notes
Daily Log
Questions
Discussion
Instructions
Assignment
Vocabulary
Grammar
Research-backed: Categorization + annotation improves vocabulary retrieval by 40-60% (Schmidt's Noticing Hypothesis). Students who use metacognitive tags retain more than passive note-takers.
Review Mode vs Capture Mode
📖 Review Mode
- •Static antd Tag components
- •Read-only category display
- •Click to filter by tag
- •Archived review interface
✍️ Capture Mode
- •Checkbox.Group for multi-label
- •Real-time categorization
- •AutoComplete suggestions
- •Active learning mode
Bidirectional Learning Flows
Words
→
Flashcards
+
Phrases
→
Notebook
Words → Pronunciation practice | Phrases → Contextual review
Think & Speak™Section 3 of 13
Section 4
Vocabulary Acquisition & Session History Pipeline
Word Extraction Engine
The separateWords function implements linguistic tokenization:
- Splits text by whitespace, removes punctuation and numeric tokens
- Regex pattern:
/[^\p{L}]/gu — Unicode property escapes for international text
- Filters for meaningful alphabetic content only
- Chinese characters excluded from English vocabulary lists automatically
- Validation:
containsChinese() must return FALSE, containsEnglish() must return TRUE
Vocabulary Management & Entity Recognition
- getWordEntity / hasWordEntity: Case-insensitive matching against taggedWordList
- Visual Distinction: sessionHistoryWordWithEntity class → green for known words
- Zone of Proximal Development: Blue styling for new words (learning target)
- Practice Integration: Click any word → LiveWordPracticeDialog (dynamic import)
- toPracticeCount: filteredWordList.filter(item => item.tags.includes('To Practice')).length
- Default Tag: All captured words auto-tagged "To Practice" for zero-friction capture
Temporal Organization
- Smart date formatting using date-fns library
- Today: Shows time only (HH:mm:ss)
- Current year: Shows month/day + time
- Previous years: Shows full date
- Chronological listing with delete capabilities
- Session history persists via localStorage for crash recovery
Complete Pipeline
Stage 1: Live Session Text → Azure Speech SDK captures utterances
↓
Stage 2: Word Extraction → separateWords() tokenizes, filters, validates
↓
Stage 3: Entity Recognition → Match against taggedWordList (green = known, blue = new)
↓
Stage 4: Vocabulary Bank → Categorize, tag, store in personal collection
↓
Stage 5: Flashcard Practice → Pronunciation scoring, spaced repetition, mastery tracking
Think & Speak™Section 4 of 13
Section 5
Onboarding & User Experience Engineering
Tour System Architecture
- Redux Integration: tourProgress and isTouringAppPage persisted in global state
- Step Indexing: Role-specific initialization (student vs teacher tour start points)
- Progress Tracking: Backend synchronization via useCallUpdateUserGuide API
- Conditional UI: Modal opening/closing logic controlled by tour state
- Mock Data: tourMockNotes provides curated sample content during tours
- Activity Logging: reportActivityLog captures learning engagement metrics
- Pedagogical Model: Gradual Release of Responsibility (Pearson & Gallagher, 1983)
Bilingual Support Strategy
- All tour content available in English and Cantonese
- Translanguaging pedagogy: students learn navigation in comfort language
- English terminology absorbed through parallel bilingual exposure
- Reduces affective filter — students comfortable engaging immediately
- Footer always shows product name in primary language for brand anchoring
Student Tour (10 Steps)
Steps 1-2: Welcome & Messages
Introduction to interface, message history overview
Steps 3-5: My Notes
Note capture, categorization, 250-char remark system
Steps 6-8: My Notebook
Category browsing, search, dual-channel flow
Steps 9-10: My Flashcards
Spaced repetition practice, pronunciation scoring
Teacher Tour (6 Steps)
Steps 1-3: Room Setup
Create room, configure students, review options
Step 4: Language Settings
Set translation targets, source language
Steps 5-6: Live Session
Start session, manage attendees
Completion 🎉
Mascot celebration, "Ready to Shine?" dialog
Mascot Integration
🦄
"Ari" the mascot appears at consistent positions during onboarding, providing visual continuity anchors. Floating emojis (📎 📖 ✨) celebrate milestones. The finish dialog creates a sense of achievement through gradient backgrounds and particle animations.
Think & Speak™Section 5 of 13
Section 6
Technical Implementation Insights
State Management Patterns
- useCall[Action]: Custom hook pattern for standardized API layers with loading/error states
- useSelector: Redux selectors for global state (tour progress, room data, vocabulary lists)
- useState: Local component state for UI interactions (modal visibility, form inputs)
- useMemo: Expensive computations (note filtering, word extraction, entity matching)
- React Refs: wsRef, reconnectTimeoutRef for WebSocket lifecycle without re-renders
- useDebounceFn: Input debouncing (300-500ms) for recording controls
Resilience Patterns
- WebSocket reconnection: exponential backoff (max 3 attempts, 10s cap)
- Auto token refresh: every 10 minutes via autoRefetchRecognitionTokenTimeoutRef
- Message deduplication: Set-based tracking prevents duplicate WebSocket notifications
- Graceful degradation: cancellation handlers prevent interface hanging on errors
- localStorage backup: useLocalStorage hook persists critical session data
- Failure-count ceilings: polling mechanisms cap retry attempts to prevent infinite loops
Responsive Design Strategy
- useBreakpoints: Hook for mobile/tablet/desktop detection
- Conditional Rendering: Different layouts per viewport (flex vs stack)
- SCSS Global Selectors: :global(.mobile) pattern for responsive overrides
- Breakpoints: 841px (desktop), 564px (tablet), below (mobile)
- Touch Targets: Minimum 44x44px per Apple Human Interface Guidelines
Performance Optimizations
- Dynamic Imports: Practice dialogs use ssr:false for code splitting
- Memoized Components: Expensive renders cached until dependencies change
- Optimistic Updates: UI updates immediately with backend rollback on error
- 10-Second Poll: continuouslyFetchRoomsDelay constant for room status
- SVG Icons: Vector graphics (not font icons) prevent FOIT during loading
- Tree Shaking: Barrel exports in Index.tsx enable per-route optimization
Architecture Quality: TypeScript strict mode + CSS Modules + component-level isolation ensures long-term maintainability. Enterprise-grade patterns for institutional deployment.
Think & Speak™Section 6 of 13
Section 7
The Cognitive Learning Loop
Phase 1: LISTEN
Comprehensible Input (Krashen's Theory)
- Real-time WebSocket translation delivers slightly advanced content
- Bilingual subtitles in English ↔ Mandarin/Cantonese simultaneously
- Reduces anxiety by removing language comprehension barriers
- Students access content they wouldn't understand otherwise
- Source message preserves original language for practice reference
Phase 2: CAPTURE
Noticing Hypothesis (Schmidt)
- Note capture forces attention to form — converts input to intake
- Word selection with entity recognition (green/blue color coding)
- Metacognitive categorization: 9 tag types for self-awareness
- 250-character remark constraint for concise summarization
- Auto-tagging "To Practice" for zero-friction vocabulary capture
- Multi-word selection (≥5) auto-generates scripts for Read & Speak
Phase 3: REVIEW
Output Practice (Swain's Hypothesis)
- MyNoteBookDialog organizes notes by category for structured review
- Full-text search across all historical notes and sessions
- Vocabulary bank with practice count tracking per word
- Session history as learning journal — every class becomes reviewable
- ±50 character context window preserves usage context
Phase 4: MASTER
Metacognitive Training + Spaced Repetition
- FlashCardDialog with ternary states: Need Review → Active → Mastered
- 80-90% archive threshold — quality over quantity mandate
- Practice countMax enforcement prevents gaming the system
- Phoneme-level scoring at each practice attempt
- Mascot feedback (5-tier system) maintains motivation
- Dashboard tracks mastery progression over time
This isn't accidental: Every technical decision was made with pedagogical theory in mind. Krashen for input, Schmidt for noticing, Swain for output — the platform operationalizes second language acquisition research into engineering.
Think & Speak™Section 7 of 13
Section 8
Vocabulary Acquisition Probability Engine
Importance Algorithm
The hasWordEntity function creates a prioritized learning queue based on the student's personal database (taggedWordListData).
- Known Territory (Green): Words already in vocabulary receive sessionHistoryWordWithEntity class — signal for reinforcement practice
- Zone of Proximal Development (Blue): New words represent learning targets — students see exactly what needs attention
- Confidence Calibration: Color coding tells students at a glance what to review vs. what to learn
- Personalized Priority: Algorithm considers word frequency in sessions, practice history, and mastery state
- Prevents Overwhelm: Known words don't clutter the learning interface — focus stays on new acquisition
Progress Visualization
Visual feedback through the color-coded confidence system:
🔵
New Words
Zone of Proximal Development
🟢
Known Words
Reinforcement Target
⭐
Mastered
80-90%+ Score Archive
Contextual Capture Workflow
Step 1: Select Word from transcript
↓
Step 2: Validate (English only, no Chinese)
↓
Step 3: Match Entity against vocabulary database
↓
Step 4: Categorize with tags (auto "To Practice")
↓
Step 5: Add to Practice Queue for flashcard review
Validation Rules
const wordsToAdd = extractEnglishWords(selectedTextRange.text);
if (wordsToAdd.length === 0) {
message.warning('Word must contain English');
return;
}
Strict validation prevents mixed-language confusion during flashcard practice. Maintains vocabulary database integrity for effective spaced repetition.
Think & Speak™Section 8 of 13
Section 9
Pedagogical UX Design & Progressive Disclosure
Vygotskian Scaffolding Architecture
The tour system implements the Gradual Release of Responsibility model (Pearson & Gallagher, 1983), matching learning theory with UX design.
- Teacher Modeling (Steps 1-2): System demonstrates features → student observes
- Guided Practice (Steps 3-5): System leads → student follows with scaffolded support
- Collaborative Learning (Steps 6-8): System collaborates → student applies with hints
- Independent Application (Steps 9-10): Student navigates → system celebrates completion
- Each phase reduces scaffolding while increasing student agency
- Progressive disclosure prevents overwhelm — features revealed only when relevant
Micro-Interaction Design for Engagement
- 250-Character Constraint: Mirrors Cornell Notes methodology — encourages concise thinking over verbatim transcription
- Dual-Mode Editing: Review vs Capture modes create seamless transitions without interrupting learning flow
- Auto-Sizing Textarea: minRows: 1, maxRows: 4 — contextual expansion prevents visual clutter
- AutoComplete Suggestions: "Key Point", "Don't Understand", "Exam Material", "Review Later" reduce cognitive load during categorization
- Immediate Feedback: Every interaction produces visible response within 200ms (perceived instant)
Cognitive Load Management
- Font Size: 24px desktop / 18px mobile — accommodates bilingual reading (EN + ZH)
- Line Height: 32px desktop / 26px mobile — prevents visual crowding during simultaneous EN-ZH reading
- Max Width: 75% desktop / 90% mobile — optimal reading line length (~66 characters)
- Spacing: 60px top padding creates "classroom whiteboard" separation on desktop
- Visual Hierarchy: Color-coded categories, icon-based navigation, progressive disclosure
AI Presence & Interruption Design
- nodRocket Animation: Blue ARI mascot "nods" while AI processes — mimics human teacher non-verbal cues during wait time
- Addresses Uncanny Valley: Students perceive active engagement, not dead time
- Skip Button (opacity: 0 → 100 on hover): Students can interrupt AI mid-explanation
- Builds Speaking Confidence: Interrupting in conversation is a real-world skill
- Activity State Awareness: "Clear History" hidden during active AI explanation — critical safeguard for young learners
Research-Backed: Every micro-interaction is grounded in learning science. The 250-char constraint (Cornell Notes), progressive disclosure (Vygotsky), and affective feedback loops (Krashen) transform UX into pedagogy.
Think & Speak™Section 9 of 13