Nobody built this. So we did.
131 questions. 8 developmental domains. Ages 0 to 36 months. Every question weighted by clinical evidence, every result traceable to the exact logic that produced it. Open-source, runs locally, and will never send your child's data anywhere.
npm package · npm install @mychild/engine · Apache-2.0 (code) / CC BY-SA 4.0 (data)
The problem
The CDC checklists are free, but they're just checkboxes. No logic, no scoring, no follow-up. The validated instruments like ASQ-3 and M-CHAT-R/F actually work, but they're copyrighted and expensive. There's literally nothing in between.
MyChild Engine sits in that gap. It tracks milestones longitudinally, weights each observation by clinical evidence, and tells you exactly why it flagged something. It doesn't diagnose. It helps parents and caregivers notice patterns early enough to have the right conversation with a doctor.
Under the hood
Every question gets scored against the child's corrected age. Results aggregate across domains with evidence sufficiency gates, and every single decision is traceable. No black boxes.
Weighted questions
Not all observations are equal. Each question carries a weight: Low, Medium, High, or Red-Flag. Low-weight items need corroboration before they matter. Red-flags skip the line.
Independent domains
Gross Motor, Fine Motor, Receptive Language, Expressive Language, Social-Emotional, Cognitive, Self-Help, Vision/Hearing. Each one scored on its own. A delay in one doesn't contaminate the others.
Age-appropriate bands
Birth through 36 months, plus universal red flags that apply at any age. The engine only asks what's developmentally relevant right now. No premature questions, no wasted anxiety.
False alarm protection
The engine won't flag "high concern" from a single observation. It requires at least 2 independent data points before escalating. One bad day shouldn't send a parent into a spiral.
Preterm correction built in
If your child was born before 37 weeks, the engine automatically adjusts expectations until 24 months. Parents of preemies don't need to do mental math to figure out what's actually on track.
Show your work
Every result comes with a plain-English explanation of exactly what drove it. Parents deserve to know why. Clinicians need to verify the reasoning. Both get it.
Rule simulator
Run synthetic child timelines against different threshold configurations. Change a weight, adjust a grace period, see exactly which alerts shift and by how much. We built this because no screening tool, open or proprietary, lets you actually test the rules before deploying them.
We tested every rule
We verified the engine on 108 synthetic scenarios, including borderline cases. Across 294 hand-labeled domain observations, it reached 98.6% agreement with expected developmental profiles. Every known delay case was flagged. The 4 disagreements were all borderline over-flags where a clinician might say "monitor and re-check."
98.6%
Agreement
Across labeled observations
0.971
Cohen's Kappa
Strong agreement
108
Scenarios
Including borderline cases
4
Threshold cases
Sent for clinical review
What the engine covers
Social-Emotional
Shared attention, reciprocity, affect, and parent-child interaction cues.
Expressive Language
Babbling, gestures, single words, phrases, and communicative intent.
Gross Motor
Posture, rolling, sitting, crawling, standing, walking, and coordination.
Vision / Hearing
Tracking, startle, response to voices, name response, and sensory cues.
Cognitive / Play
Problem-solving, imitation, curiosity, object use, and pretend play.
Receptive Language
Understanding words, following commands, and language comprehension.
Fine Motor
Reach, grasp, transfer, pincer skills, and hand use in daily activity.
Self-Help / Adaptive
Feeding, dressing, routines, and age-appropriate daily living skills.
What this proves
- The engine is built on CDC 2022 milestones across 8 developmental domains
- Every known delay case in the verification set was flagged
- The disagreements were all cautious over-flags, not misses
- Regression detection works, and the engine abstains when there isn't enough data
- The rules are open source and can be inspected end to end
What this does NOT prove
- Clinical validation with real patient data
- Gold-standard sensitivity or specificity in real-world populations
- Benchmarking against ASQ-3, M-CHAT-R/F, Denver, or RBSK outcomes
- Final clinical threshold calibration
- Performance across every language, culture, and care setting
The engine knows when to shut up. In 45 of 294 observations (15.3%), it returned insufficient_evidence instead of making a call. Fewer than 2 answered questions in a domain? It declines to classify. All 45 of those abstentions were correctly negative in the verification set.
Reproduce it yourself
npm install mychild-engine
npx mychild-engine validate --profiles data/synthetic-profiles.json --format markdown
India
India has 26 million babies a year. The screening infrastructure doesn't scale.
RBSK (Rashtriya Bal Swasthya Karyakram) is India's national child health screening program. It's been independently validated at 97% sensitivity and 96.4% specificity against ASQ-3. The problem isn't the tool. It's that there aren't enough trained health workers to run it at the scale India needs.
We've mapped our question bank domain-by-domain against RBSK screening items. The questions align. Community health workers running RBSK screenings could use the same questions with the engine's scoring logic on top, on a phone, offline, without a clinician standing next to them.
To be clear: mapping our questions to RBSK doesn't make this RBSK-validated. The scoring logic hasn't been tested against RBSK benchmarks or with Indian populations. This is alignment, not endorsement. The validation work is still ahead of us.
What we're building next
The engine works. Now we're making it smarter.
Adaptive questioning
Right now the engine asks every question in a domain. It shouldn't have to. If the first three answers are all "yes, doing this easily," the remaining motor questions probably don't need asking. Fewer questions, less parent fatigue, same signal.
NLP probe analysis
Parents don't think in yes/no checkboxes. "He kind of does it but only when he's in a good mood" has real signal in it. We want the engine to understand free-text responses and extract what matters without losing the nuance.
Longitudinal pattern recognition
One screening is a snapshot. Three screenings over six months is a trajectory. We want the engine to distinguish a late bloomer from a persistent delay, and catch regression patterns before a parent has to notice them on their own.
Recognized by
Google made a documentary about this
Back when MyChild App was live, Google filmed the story of how a kid with dyspraxia built a screening tool used in 100+ countries. This is where the engine came from.
Five minutes to your first screening
# Install
npm install @mychild/engine
# Use
import { evaluate, computeChildAge, getDueQuestions } from '@mychild/engine';
// Get age-appropriate questions for a 7-month-old
const questions = getDueQuestions({ dob: new Date('2025-09-01') }, []);
// Returns 12 questions across Motor, Language, Cognitive, Social
Full API docs, architecture walkthrough, and integration guide at /docs. Package page on npm.
Where the questions come from
- 1.
Zubler JM, Wiggins LD, Macias MM, et al. "Evidence-Informed Milestones for Developmental Surveillance Tools." Pediatrics. 2022;149(3):e2021052138.
doi:10.1542/peds.2021-052138 - 2.
Lipkin PH, Macias MM; Council on Children with Disabilities. "Promoting Optimal Development: Identifying Infants and Young Children With Developmental Disorders Through Developmental Surveillance and Screening." Pediatrics. 2020;145(1):e20193449.
doi:10.1542/peds.2019-3449 - 3.
CDC "Learn the Signs. Act Early." Program — public domain milestone checklists for caregiver-facing developmental monitoring.
cdc.gov/act-early/milestones - 4.
RBSK (Rashtriya Bal Swasthya Karyakram) — India's National Child Health Screening Program. Validated at 97% sensitivity, 96.4% specificity against ASQ-3.
rbsk.mohfw.gov.in - 5.
"The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures." BMC Medical Informatics and Decision Making. 2019.
doi:10.1186/s12911-019-0793-0 - 6.
"Synthetic data can aid the analysis of clinical outcomes: How much can it be trusted?" PNAS. 2024.
doi:10.1073/pnas.2414310121 - 7.
"Synthetic Validation of Pediatric Trust Instruments Using LLMs." MedRxiv preprint. 2025.
medrxiv.org/10.1101/2025.11.25.25340922v1
Please read this
This is not a diagnostic tool. It tracks developmental milestones and surfaces patterns. It cannot and does not diagnose any medical condition, developmental disorder, or disability. If something concerns you about your child's development, talk to your pediatrician. That conversation is the whole point.
We've verified the scoring logic against 108 synthetic scenarios, including borderline cases. Across 294 hand-labeled domain observations, the engine reached 98.6% agreement and flagged every known delay case in the set. That still does NOT make it clinically validated. Real-patient validation is the next milestone. The question bank draws from publicly available CDC milestone checklists, not copyrighted instruments like ASQ-3, M-CHAT-R/F, or Denver. Everything runs locally on your device. No child data leaves your machine.