Back after 10 years. The problem never went away.

Nobody built this. So we did.

131 questions. 8 developmental domains. Ages 0 to 36 months. Every question weighted by clinical evidence, every result traceable to the exact logic that produced it. Open-source, runs locally, and will never send your child's data anywhere.

npm package · npm install @mychild/engine · Apache-2.0 (code) / CC BY-SA 4.0 (data)

The problem

The CDC checklists are free, but they're just checkboxes. No logic, no scoring, no follow-up. The validated instruments like ASQ-3 and M-CHAT-R/F actually work, but they're copyrighted and expensive. There's literally nothing in between.

MyChild Engine sits in that gap. It tracks milestones longitudinally, weights each observation by clinical evidence, and tells you exactly why it flagged something. It doesn't diagnose. It helps parents and caregivers notice patterns early enough to have the right conversation with a doctor.

Under the hood

Every question gets scored against the child's corrected age. Results aggregate across domains with evidence sufficiency gates, and every single decision is traceable. No black boxes.

131

Weighted questions

Not all observations are equal. Each question carries a weight: Low, Medium, High, or Red-Flag. Low-weight items need corroboration before they matter. Red-flags skip the line.

8

Independent domains

Gross Motor, Fine Motor, Receptive Language, Expressive Language, Social-Emotional, Cognitive, Self-Help, Vision/Hearing. Each one scored on its own. A delay in one doesn't contaminate the others.

10

Age-appropriate bands

Birth through 36 months, plus universal red flags that apply at any age. The engine only asks what's developmentally relevant right now. No premature questions, no wasted anxiety.

False alarm protection

The engine won't flag "high concern" from a single observation. It requires at least 2 independent data points before escalating. One bad day shouldn't send a parent into a spiral.

Preterm correction built in

If your child was born before 37 weeks, the engine automatically adjusts expectations until 24 months. Parents of preemies don't need to do mental math to figure out what's actually on track.

Show your work

Every result comes with a plain-English explanation of exactly what drove it. Parents deserve to know why. Clinicians need to verify the reasoning. Both get it.

Rule simulator

Run synthetic child timelines against different threshold configurations. Change a weight, adjust a grace period, see exactly which alerts shift and by how much. We built this because no screening tool, open or proprietary, lets you actually test the rules before deploying them.

Verified

We tested every rule

We verified the engine on 108 synthetic scenarios, including borderline cases. Across 294 hand-labeled domain observations, it reached 98.6% agreement with expected developmental profiles. Every known delay case was flagged. The 4 disagreements were all borderline over-flags where a clinician might say "monitor and re-check."

98.6%

Agreement

Across labeled observations

0.971

Cohen's Kappa

Strong agreement

108

Scenarios

Including borderline cases

4

Threshold cases

Sent for clinical review

What the engine covers

Social-Emotional

Shared attention, reciprocity, affect, and parent-child interaction cues.

Expressive Language

Babbling, gestures, single words, phrases, and communicative intent.

Gross Motor

Posture, rolling, sitting, crawling, standing, walking, and coordination.

Vision / Hearing

Tracking, startle, response to voices, name response, and sensory cues.

Cognitive / Play

Problem-solving, imitation, curiosity, object use, and pretend play.

Receptive Language

Understanding words, following commands, and language comprehension.

Fine Motor

Reach, grasp, transfer, pincer skills, and hand use in daily activity.

Self-Help / Adaptive

Feeding, dressing, routines, and age-appropriate daily living skills.

What this proves

  • The engine is built on CDC 2022 milestones across 8 developmental domains
  • Every known delay case in the verification set was flagged
  • The disagreements were all cautious over-flags, not misses
  • Regression detection works, and the engine abstains when there isn't enough data
  • The rules are open source and can be inspected end to end

What this does NOT prove

  • Clinical validation with real patient data
  • Gold-standard sensitivity or specificity in real-world populations
  • Benchmarking against ASQ-3, M-CHAT-R/F, Denver, or RBSK outcomes
  • Final clinical threshold calibration
  • Performance across every language, culture, and care setting

The engine knows when to shut up. In 45 of 294 observations (15.3%), it returned insufficient_evidence instead of making a call. Fewer than 2 answered questions in a domain? It declines to classify. All 45 of those abstentions were correctly negative in the verification set.

Reproduce it yourself

npm install mychild-engine

npx mychild-engine validate --profiles data/synthetic-profiles.json --format markdown

India

India has 26 million babies a year. The screening infrastructure doesn't scale.

RBSK (Rashtriya Bal Swasthya Karyakram) is India's national child health screening program. It's been independently validated at 97% sensitivity and 96.4% specificity against ASQ-3. The problem isn't the tool. It's that there aren't enough trained health workers to run it at the scale India needs.

We've mapped our question bank domain-by-domain against RBSK screening items. The questions align. Community health workers running RBSK screenings could use the same questions with the engine's scoring logic on top, on a phone, offline, without a clinician standing next to them.

To be clear: mapping our questions to RBSK doesn't make this RBSK-validated. The scoring logic hasn't been tested against RBSK benchmarks or with Indian populations. This is alignment, not endorsement. The validation work is still ahead of us.

See the full RBSK alignment

What we're building next

The engine works. Now we're making it smarter.

1

Adaptive questioning

Right now the engine asks every question in a domain. It shouldn't have to. If the first three answers are all "yes, doing this easily," the remaining motor questions probably don't need asking. Fewer questions, less parent fatigue, same signal.

2

NLP probe analysis

Parents don't think in yes/no checkboxes. "He kind of does it but only when he's in a good mood" has real signal in it. We want the engine to understand free-text responses and extract what matters without losing the nuance.

3

Longitudinal pattern recognition

One screening is a snapshot. Three screenings over six months is a trajectory. We want the engine to distinguish a late bloomer from a persistent delay, and catch regression patterns before a parent has to notice them on their own.

Google made a documentary about this

Back when MyChild App was live, Google filmed the story of how a kid with dyspraxia built a screening tool used in 100+ countries. This is where the engine came from.

Five minutes to your first screening

# Install

npm install @mychild/engine


# Use

import { evaluate, computeChildAge, getDueQuestions } from '@mychild/engine';


// Get age-appropriate questions for a 7-month-old

const questions = getDueQuestions({ dob: new Date('2025-09-01') }, []);

// Returns 12 questions across Motor, Language, Cognitive, Social

Full API docs, architecture walkthrough, and integration guide at /docs. Package page on npm.

Where the questions come from

  1. 1.

    Zubler JM, Wiggins LD, Macias MM, et al. "Evidence-Informed Milestones for Developmental Surveillance Tools." Pediatrics. 2022;149(3):e2021052138.

    doi:10.1542/peds.2021-052138
  2. 2.

    Lipkin PH, Macias MM; Council on Children with Disabilities. "Promoting Optimal Development: Identifying Infants and Young Children With Developmental Disorders Through Developmental Surveillance and Screening." Pediatrics. 2020;145(1):e20193449.

    doi:10.1542/peds.2019-3449
  3. 3.

    CDC "Learn the Signs. Act Early." Program — public domain milestone checklists for caregiver-facing developmental monitoring.

    cdc.gov/act-early/milestones
  4. 4.

    RBSK (Rashtriya Bal Swasthya Karyakram) — India's National Child Health Screening Program. Validated at 97% sensitivity, 96.4% specificity against ASQ-3.

    rbsk.mohfw.gov.in
  5. 5.

    "The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures." BMC Medical Informatics and Decision Making. 2019.

    doi:10.1186/s12911-019-0793-0
  6. 6.

    "Synthetic data can aid the analysis of clinical outcomes: How much can it be trusted?" PNAS. 2024.

    doi:10.1073/pnas.2414310121
  7. 7.

    "Synthetic Validation of Pediatric Trust Instruments Using LLMs." MedRxiv preprint. 2025.

    medrxiv.org/10.1101/2025.11.25.25340922v1

Please read this

This is not a diagnostic tool. It tracks developmental milestones and surfaces patterns. It cannot and does not diagnose any medical condition, developmental disorder, or disability. If something concerns you about your child's development, talk to your pediatrician. That conversation is the whole point.

We've verified the scoring logic against 108 synthetic scenarios, including borderline cases. Across 294 hand-labeled domain observations, the engine reached 98.6% agreement and flagged every known delay case in the set. That still does NOT make it clinically validated. Real-patient validation is the next milestone. The question bank draws from publicly available CDC milestone checklists, not copyrighted instruments like ASQ-3, M-CHAT-R/F, or Denver. Everything runs locally on your device. No child data leaves your machine.