Think & Speak™ — AI English Learning Platform Workshop

Section 1

Platform Architecture Overview

Dual-Product Ecosystem

Think & Speak is a comprehensive language learning ecosystem comprising two main product lines that work together to create a complete learning experience for Hong Kong classrooms.

AI English — Personalized learning with assessment-driven content, Course Adventure paths, vocabulary mastery, and conversation practice
Live Translation — Real-time multilingual classroom support bridging English ↔ Mandarin ↔ Cantonese during live sessions
Both product lines share core infrastructure: WebSocket communication, Azure Speech Services, Redux state management
Separate Student and Teacher interfaces optimized for distinct roles and workflows

Technology Stack

React / Next.js TypeScript Redux Ant Design SCSS Modules Azure Speech SDK WebSocket date-fns

Frontend: React/Next.js with TypeScript for type-safe component development
State: Redux for global state + local useState for component UI
AI/ML: Azure Cognitive Services for real-time speech recognition & translation
Communication: WebSocket for real-time classroom sync with 10-second polling
UI: Ant Design component library with custom SCSS module overrides

Security & Data Handling

End-to-end encryption for all WebSocket data transmission
Azure-certified cloud infrastructure (ISO 27001, SOC 2)
GDPR-compliant data handling with school data segregation
Role-based access control: Teacher (admin) vs Student (personal data only)
Optional PII fields only — minimizes data collection for K12 students

Component Architecture

The platform uses a modular component architecture separating pedagogical content delivery from real-time classroom management.

Student Interface

ChatBox, MyNotebook, Flashcards, Dashboard

Teacher Interface

RoomControl, Attendance, Heatmap, Export

Trilingual Support

🇬🇧

English

en-US source + target

🇨🇳

Mandarin

zh-CN Simplified Chinese

🇭🇰

Cantonese

zh-HK Traditional Chinese

Auto-routing: English → Cantonese default; Chinese inputs → English

Learning Pipeline Overview

Listen

→

Capture

→

Review

→

Master

Every classroom utterance becomes a potential learning asset through this automated pipeline

Think & Speak™Section 1 of 13

Section 2

Real-Time Classroom Infrastructure

ChatBox Component

Dimensional Constraints: 64px fixed header, 96px fixed footer, flexible main content area
Theme Adaptability: CSS variables --chat-box-bgc and --chat-box-white-bgc
Responsive: Border radius 32px (desktop) → 0px (mobile) for full-screen immersive mode
Dual Mode: data-show-result='true' attribute selector for result view
Flexbox Layout: Header (6%) + Main Content (88%) + Footer (6%) height distribution
Footer contains input field, recording button, and action controls

Live Translation Engine

WebSocket communication for real-time subtitle delivery across connected browsers
Azure Speech Translation Recognizer: en-US source → zh-CN + zh-HK targets
Intermediate ('recognizing') results enable real-time subtitle streaming
Final ('recognized') results populate session history for vocabulary extraction
Message objects contain: originalText, translations, timestamp, senderId
StickyMessage interface for pinning critical teacher announcements
Exponential backoff reconnection (max 3 attempts, 10s delays)

Room Management System

RoomCard: Visual identity via deterministic theme hashing (roomCardBlue, roomCardYellow, roomCardPink)
Status Rendering: roomCardInactive class applies opacity/grayscale for completed sessions
Avatar Overlap: Up to 6 visible avatars with 20% overlap, "+N more" indicator for overflow
Lifecycle: RoomStatus state machine: WAITING → STARTED → IN_PROGRESS → COMPLETED
Auto-transition: onlineStudentCount > 0 threshold triggers IN_PROGRESS

Session Features

Bilingual subtitle support across English, Mandarin, Cantonese
Participant avatar stacking with visual density indicators
Room creation/archival workflow with admin controls
Text selection intelligence: excludes timestamps, headers, translation metadata
Single-word → vocabulary; Multi-word (≥5) → scripts via 400ms debounce
Session export for post-class learning asset creation

Room Lifecycle

1

CREATE

Room + Student list

2

START

Teacher begins

3

MANAGE

Real-time translation

4

END

Archive + Export

Think & Speak™Section 2 of 13

Section 3

The Intelligent Note-Taking Ecosystem

Data Transformation Layer

The transformNoteListToMyNoteBookDialogData function serves as an intelligent categorization engine:

Parses comma-separated labels from backend, maps standardized tags to icon variants via labelToVariant mapping
Supports custom categories via slugifyTag — prefix "custom-" for user-created tags
Calculates per-category counts for display with "new" badges on recent items
memoized for performance — only re-computes when dependencies change
NoteItem component: text, remark (max 250 chars), label, comment[] array

UI/UX Design Patterns

Grid Navigation: Category cards with color-coded left borders in responsive grid
Real-time Search: Full-text filtering across note content and categories
Tour Integration: isTourMode flag enables mockNoteData injection during guided onboarding
Batch Operations: Multi-select with confirmation modals for deletion
Smart Fallbacks: Graceful handling when note data is missing or corrupt
Dual-Mode Editing: Review mode (static tags) vs Capture mode (checkbox multi-label)

250-Character Constraint: Mirrors Cornell Notes methodology — encourages concise summarization rather than verbatim transcription. Develops metacognitive awareness through personal reflection.

Nine-Category Taxonomy

Each category has a unique color and icon for instant visual recognition. Students can tag notes with multiple categories for cross-referencing.

Lecture Notes Daily Log Questions Discussion Instructions Assignment Vocabulary Grammar

Research-backed: Categorization + annotation improves vocabulary retrieval by 40-60% (Schmidt's Noticing Hypothesis). Students who use metacognitive tags retain more than passive note-takers.

Review Mode vs Capture Mode

📖 Review Mode

•Static antd Tag components
•Read-only category display
•Click to filter by tag
•Archived review interface

✍️ Capture Mode

•Checkbox.Group for multi-label
•Real-time categorization
•AutoComplete suggestions
•Active learning mode

Bidirectional Learning Flows

Words

→

Flashcards

+

Phrases

→

Notebook

Words → Pronunciation practice | Phrases → Contextual review

Think & Speak™Section 3 of 13

Section 4

Vocabulary Acquisition & Session History Pipeline

Word Extraction Engine

The separateWords function implements linguistic tokenization:

Splits text by whitespace, removes punctuation and numeric tokens
Regex pattern: /[^\p{L}]/gu — Unicode property escapes for international text
Filters for meaningful alphabetic content only
Chinese characters excluded from English vocabulary lists automatically
Validation: containsChinese() must return FALSE, containsEnglish() must return TRUE

Vocabulary Management & Entity Recognition

getWordEntity / hasWordEntity: Case-insensitive matching against taggedWordList
Visual Distinction: sessionHistoryWordWithEntity class → green for known words
Zone of Proximal Development: Blue styling for new words (learning target)
Practice Integration: Click any word → LiveWordPracticeDialog (dynamic import)
toPracticeCount: filteredWordList.filter(item => item.tags.includes('To Practice')).length
Default Tag: All captured words auto-tagged "To Practice" for zero-friction capture

Temporal Organization

Smart date formatting using date-fns library
Today: Shows time only (HH:mm:ss)
Current year: Shows month/day + time
Previous years: Shows full date
Chronological listing with delete capabilities
Session history persists via localStorage for crash recovery

Complete Pipeline

Stage 1: Live Session Text → Azure Speech SDK captures utterances

↓

Stage 2: Word Extraction → separateWords() tokenizes, filters, validates

↓

Stage 3: Entity Recognition → Match against taggedWordList (green = known, blue = new)

↓

Stage 4: Vocabulary Bank → Categorize, tag, store in personal collection

↓

Stage 5: Flashcard Practice → Pronunciation scoring, spaced repetition, mastery tracking

Think & Speak™Section 4 of 13

Section 5

Onboarding & User Experience Engineering

Tour System Architecture

Redux Integration: tourProgress and isTouringAppPage persisted in global state
Step Indexing: Role-specific initialization (student vs teacher tour start points)
Progress Tracking: Backend synchronization via useCallUpdateUserGuide API
Conditional UI: Modal opening/closing logic controlled by tour state
Mock Data: tourMockNotes provides curated sample content during tours
Activity Logging: reportActivityLog captures learning engagement metrics
Pedagogical Model: Gradual Release of Responsibility (Pearson & Gallagher, 1983)

Bilingual Support Strategy

All tour content available in English and Cantonese
Translanguaging pedagogy: students learn navigation in comfort language
English terminology absorbed through parallel bilingual exposure
Reduces affective filter — students comfortable engaging immediately
Footer always shows product name in primary language for brand anchoring

Student Tour (10 Steps)

Steps 1-2: Welcome & Messages

Introduction to interface, message history overview

Steps 3-5: My Notes

Note capture, categorization, 250-char remark system

Steps 6-8: My Notebook

Category browsing, search, dual-channel flow

Steps 9-10: My Flashcards

Spaced repetition practice, pronunciation scoring

Teacher Tour (6 Steps)

Steps 1-3: Room Setup

Create room, configure students, review options

Step 4: Language Settings

Set translation targets, source language

Steps 5-6: Live Session

Start session, manage attendees

Completion 🎉

Mascot celebration, "Ready to Shine?" dialog

Mascot Integration

🦄

"Ari" the mascot appears at consistent positions during onboarding, providing visual continuity anchors. Floating emojis (📎 📖 ✨) celebrate milestones. The finish dialog creates a sense of achievement through gradient backgrounds and particle animations.

Think & Speak™Section 5 of 13

Section 6

Technical Implementation Insights

State Management Patterns

useCall[Action]: Custom hook pattern for standardized API layers with loading/error states
useSelector: Redux selectors for global state (tour progress, room data, vocabulary lists)
useState: Local component state for UI interactions (modal visibility, form inputs)
useMemo: Expensive computations (note filtering, word extraction, entity matching)
React Refs: wsRef, reconnectTimeoutRef for WebSocket lifecycle without re-renders
useDebounceFn: Input debouncing (300-500ms) for recording controls

Resilience Patterns

WebSocket reconnection: exponential backoff (max 3 attempts, 10s cap)
Auto token refresh: every 10 minutes via autoRefetchRecognitionTokenTimeoutRef
Message deduplication: Set-based tracking prevents duplicate WebSocket notifications
Graceful degradation: cancellation handlers prevent interface hanging on errors
localStorage backup: useLocalStorage hook persists critical session data
Failure-count ceilings: polling mechanisms cap retry attempts to prevent infinite loops

Responsive Design Strategy

useBreakpoints: Hook for mobile/tablet/desktop detection
Conditional Rendering: Different layouts per viewport (flex vs stack)
SCSS Global Selectors: :global(.mobile) pattern for responsive overrides
Breakpoints: 841px (desktop), 564px (tablet), below (mobile)
Touch Targets: Minimum 44x44px per Apple Human Interface Guidelines

Desktop

Tablet

Mobile

Performance Optimizations

Dynamic Imports: Practice dialogs use ssr:false for code splitting
Memoized Components: Expensive renders cached until dependencies change
Optimistic Updates: UI updates immediately with backend rollback on error
10-Second Poll: continuouslyFetchRoomsDelay constant for room status
SVG Icons: Vector graphics (not font icons) prevent FOIT during loading
Tree Shaking: Barrel exports in Index.tsx enable per-route optimization

Architecture Quality: TypeScript strict mode + CSS Modules + component-level isolation ensures long-term maintainability. Enterprise-grade patterns for institutional deployment.

Think & Speak™Section 6 of 13

Section 7

The Cognitive Learning Loop

Phase 1: LISTEN

Comprehensible Input (Krashen's Theory)

Real-time WebSocket translation delivers slightly advanced content
Bilingual subtitles in English ↔ Mandarin/Cantonese simultaneously
Reduces anxiety by removing language comprehension barriers
Students access content they wouldn't understand otherwise
Source message preserves original language for practice reference

Phase 2: CAPTURE

Noticing Hypothesis (Schmidt)

Note capture forces attention to form — converts input to intake
Word selection with entity recognition (green/blue color coding)
Metacognitive categorization: 9 tag types for self-awareness
250-character remark constraint for concise summarization
Auto-tagging "To Practice" for zero-friction vocabulary capture
Multi-word selection (≥5) auto-generates scripts for Read & Speak

Phase 3: REVIEW

Output Practice (Swain's Hypothesis)

MyNoteBookDialog organizes notes by category for structured review
Full-text search across all historical notes and sessions
Vocabulary bank with practice count tracking per word
Session history as learning journal — every class becomes reviewable
±50 character context window preserves usage context

Phase 4: MASTER

Metacognitive Training + Spaced Repetition

FlashCardDialog with ternary states: Need Review → Active → Mastered
80-90% archive threshold — quality over quantity mandate
Practice countMax enforcement prevents gaming the system
Phoneme-level scoring at each practice attempt
Mascot feedback (5-tier system) maintains motivation
Dashboard tracks mastery progression over time

This isn't accidental: Every technical decision was made with pedagogical theory in mind. Krashen for input, Schmidt for noticing, Swain for output — the platform operationalizes second language acquisition research into engineering.

Think & Speak™Section 7 of 13

Section 8

Vocabulary Acquisition Probability Engine

Importance Algorithm

The hasWordEntity function creates a prioritized learning queue based on the student's personal database (taggedWordListData).

Known Territory (Green): Words already in vocabulary receive sessionHistoryWordWithEntity class — signal for reinforcement practice
Zone of Proximal Development (Blue): New words represent learning targets — students see exactly what needs attention
Confidence Calibration: Color coding tells students at a glance what to review vs. what to learn
Personalized Priority: Algorithm considers word frequency in sessions, practice history, and mastery state
Prevents Overwhelm: Known words don't clutter the learning interface — focus stays on new acquisition

Progress Visualization

Visual feedback through the color-coded confidence system:

🔵

New Words

Zone of Proximal Development

🟢

Known Words

Reinforcement Target

⭐

Mastered

80-90%+ Score Archive

Contextual Capture Workflow

Step 1: Select Word from transcript

↓

Step 2: Validate (English only, no Chinese)

↓

Step 3: Match Entity against vocabulary database

↓

Step 4: Categorize with tags (auto "To Practice")

↓

Step 5: Add to Practice Queue for flashcard review

Validation Rules

const wordsToAdd = extractEnglishWords(selectedTextRange.text); if (wordsToAdd.length === 0) { message.warning('Word must contain English'); return; } // Chinese rejection + English requirement enforced // containsChinese() → FALSE, containsEnglish() → TRUE

Strict validation prevents mixed-language confusion during flashcard practice. Maintains vocabulary database integrity for effective spaced repetition.

Think & Speak™Section 8 of 13

Section 9

Pedagogical UX Design & Progressive Disclosure

Vygotskian Scaffolding Architecture

The tour system implements the Gradual Release of Responsibility model (Pearson & Gallagher, 1983), matching learning theory with UX design.

Teacher Modeling (Steps 1-2): System demonstrates features → student observes
Guided Practice (Steps 3-5): System leads → student follows with scaffolded support
Collaborative Learning (Steps 6-8): System collaborates → student applies with hints
Independent Application (Steps 9-10): Student navigates → system celebrates completion
Each phase reduces scaffolding while increasing student agency
Progressive disclosure prevents overwhelm — features revealed only when relevant

Micro-Interaction Design for Engagement

250-Character Constraint: Mirrors Cornell Notes methodology — encourages concise thinking over verbatim transcription
Dual-Mode Editing: Review vs Capture modes create seamless transitions without interrupting learning flow
Auto-Sizing Textarea: minRows: 1, maxRows: 4 — contextual expansion prevents visual clutter
AutoComplete Suggestions: "Key Point", "Don't Understand", "Exam Material", "Review Later" reduce cognitive load during categorization
Immediate Feedback: Every interaction produces visible response within 200ms (perceived instant)

Cognitive Load Management

Font Size: 24px desktop / 18px mobile — accommodates bilingual reading (EN + ZH)
Line Height: 32px desktop / 26px mobile — prevents visual crowding during simultaneous EN-ZH reading
Max Width: 75% desktop / 90% mobile — optimal reading line length (~66 characters)
Spacing: 60px top padding creates "classroom whiteboard" separation on desktop
Visual Hierarchy: Color-coded categories, icon-based navigation, progressive disclosure

AI Presence & Interruption Design

nodRocket Animation: Blue ARI mascot "nods" while AI processes — mimics human teacher non-verbal cues during wait time
Addresses Uncanny Valley: Students perceive active engagement, not dead time
Skip Button (opacity: 0 → 100 on hover): Students can interrupt AI mid-explanation
Builds Speaking Confidence: Interrupting in conversation is a real-world skill
Activity State Awareness: "Clear History" hidden during active AI explanation — critical safeguard for young learners

Research-Backed: Every micro-interaction is grounded in learning science. The 250-char constraint (Cornell Notes), progressive disclosure (Vygotsky), and affective feedback loops (Krashen) transform UX into pedagogy.

Think & Speak™Section 9 of 13

📝 Speaker Notes

Dual-Product Ecosystem

Technology Stack

Security & Data Handling

Component Architecture

Trilingual Support

Learning Pipeline Overview

ChatBox Component

Live Translation Engine

Room Management System

Session Features

Room Lifecycle

Data Transformation Layer

UI/UX Design Patterns

Nine-Category Taxonomy

Review Mode vs Capture Mode

Bidirectional Learning Flows

Word Extraction Engine

Vocabulary Management & Entity Recognition

Temporal Organization

Complete Pipeline

Tour System Architecture

Bilingual Support Strategy

Student Tour (10 Steps)

Teacher Tour (6 Steps)

Mascot Integration

State Management Patterns

Resilience Patterns

Responsive Design Strategy

Performance Optimizations

Phase 1: LISTEN

Phase 2: CAPTURE

Phase 3: REVIEW

Phase 4: MASTER

Importance Algorithm

Progress Visualization

Contextual Capture Workflow

Validation Rules

Vygotskian Scaffolding Architecture

Micro-Interaction Design for Engagement

Cognitive Load Management

AI Presence & Interruption Design