ESI Psychometric Properties

The Early Social Indicator (ESI) was developed in a program of research designed to test its soundness as a measure of early social skill (see Carta, Greenwood, Luze, Cline, & Kuntz, 2004). Some of the important features of soundness, or the technical adequacy expected of any sound measure, are reliability and validity.

A measure is reliable when two observers simultaneously recording a child’s performance return the same, or nearly the same score. A measure is also reliable when a child’s score on one occasion is comparable to that obtained on another occasion separated by only a very brief period of time (e.g., several days).

A measure is valid when it is shown to measure what it is supposed to measure, in this case, early social skill.

  • One proof of validity is a significant correlation between the ESI and a standardized parent interview measuring social-emotional competence such as the Vineland Social Emotional Early Childhood Scales, (Sparrow, Balla, & Cicchetti, 1998).
  • A second proof is a significant correlation beween the ESI and an observational measure of children’s social interaction (Howes, 1980).
  • A third proof would be finding a significant difference in the social proficiency of older children measured by the ESI compared to younger children – because in general, we expect older children to be more proficient than younger children in the birth to 3 years age range.

Sample Description

Children were recruited at 5 child care centers serving infants and toddlers located in metropolitan Kansas City. The centers served children of varied racial and socioeconomic backgrounds and 3 centers also served children with special needs. Centers represented a range of parochial and private sponsoring groups and some were affiliated with neighboring high schools serving adolescent mothers. Any child in the 0 to 36 month age range was eligible to participate in the study. Each eligible child’s parent received a packet of information that included an informed consent form and demographic questionnaire. Any child whose parents returned a signed informed consent participated over the next 9 months. The modal level of mothers’ attained education was 12th grade, with 21% of mothers indicating that this was the highest level of completed schooling. Mothers’ reported levels of attained education beyond high school included: vocational education, 4 (9%), bachelor’s degree, 18 (42%), master’s degree, 9 (21%), and the doctorate, 3 (7%).


Parents of 9 of the 57 children reported that their child had some type of special need. These included five children with Down Syndrome (one had a ventricle heart defect, another retinopathy of prematurity [ROP], that is an inflammation of the retina of the premature infant; and another with bronchopulmonary dysplasia [BPD] experiencing asthma and occasional difficulties breathing); two with pervasive developmental disability (PDD); one with William’s Syndrome; and one with a seizure disorder. Each of these nine children had Individualized Family Service Plans (IFSPs) at enrollment. Five additional children were subsequently identified with a social delay by research staff based on social developmental quotient scores at or below -1.5 SD below mean on the Vineland (Sparrow, Balla, & Cicchetti, 1998). Three of these five children identified by research staff also had IFSP’s.

Technical Measurement Results

The general design was an accelerated longitudinal study with three age cohorts at identified at start (0-12, 13-24, and 25-36 months) with nine repeated ESI measures completed for each child each observation separated by four weeks. These ESI measures occurred between beginning and ending administrations of the criterion measures. A total of 326 ESI observations were collected for 55 children. Twenty-eight percent of children had all 9 observations. Sixty-eight percent of children had between six to eight observations.Two percent had only one observation, whereas another 2% completed between two to five observations.

Reliability – Interobserver Agreement assesses the extent to which two observers produce the same score. Agreement assessments tap the extent that two observers record the key skills elements displayed by the same child being observed by both at the same time. High percentage agreement indicates that observers are well trained because they understand and apply the key skill element definitions in the same way in the recording process. Agreement estimates were based on a randomly selected 38% of 326 all assessments made.

Interobserver Agreement Findings (also see Interobserver Agreement Table below)

Pearson’s r was used to calculate the correlation between observers’ estimates of a child’s performance and the paired t-test was used to test for mean differences in scores as described by Hartmann (1977). Strong to very strong correlations and the lack of a significant difference between two observers’ estimates is excellent evidence of reliability and agreement.

Because of the near zero frequencies of negative social behaviors, it was not possible to estimate occurrence reliability for it. Thus, reliability estimates were provided for the original six key skill elements and six composites without consideration of negative behavior.

  • Agreement correlations for the key skill elements ranged from .72 (Nondirected Positive Nonverbal) to .92 (Adult Positive Verbal).
  • Similar correlations for composite scores ranged from .73 (Nondirected Positive Composite) to .97 (Positive Total Social Composite).
  • The key skill element correlations were strongest for Adult Positive (range, .91 to .92 over skills), followed by Peer Positive (range, .82 to .84 over skills), followed by Nondirected Positive (range, .72 to .73 over skills).

View the complete table of ESI interobserver agreement correlations

Reliability – Split-half (Odd vs. Even)

This form of reliability tests the comparability of ESI scores when scores are based on odd versus even observation occasions and compared. Split-half reliability findings were:

  • The odd-even Pearson correlation was .85.
  • The mean differences between odd and even estimates were statistically equal (13.1 vs. 13.9 Positive Verbal Social Behaviors recorded in 6 minutes).

Reliability – Alternate Toy Forms

This test of reliability compares movement scores formed when observations were made using alternate toys, in this case: the Tub of Toys (TT), Kitchen with Dishes (KD), and Window House (WH). The Alternate Toy Forms reliability findings were:

Pearson correlations were positive, ranging from strong to moderate in size:

  • 0.71 (TT versus WH)
  • 0.70 (WH versus KD)
  • 0.57 (TT versus KD)

PVSB (Positive Verbal Social Behaviors) mean occurrence estimates in 6 minutes were:

  • 15.5 versus 15.8 (TT versus WH)
  • 15.8 versus 10.2 (WH versus KD)
  • 15.5 versus 10.2 (TT versys KD).

Tests of differences between means indicated that KD produced PVSB estimates that were on the order of 5 responses per session lower than that of either TT (p = 0.001) or WH (p = 0.001). The TT versus WH contrast was not significantly different.

Taken together these findings suggested that KD was not an equivalent form. Thus, its use in ESI assessments is not recommended.

Criterion Validity

Tests of criterion validity were conducted to test whether or not ESI scores correlated with other measures of social competence and play. Two measures differing in informant and method of assessment were used. Together, these measures provided information from both parents (an interview) and trained observers (direct observation).

Does the ESI measure early social competence and play skill?

Correlations were positive, ranging from poor to moderate in size (also see Correlation Table below):

  • Correlations between the Vineland’s interpersonal and play/leisure scales and the PVSB mean intercept value were moderately large at .65, and .62, respectively.
  • Vineland scale corrleations with the Positive NonVerbal Social Behavior (PNVSB) intercept value were smaller at .39, and .29, respectively.
  • Vineland scale corrleations with the PositiveTotal Social Behavior (PTSB) intercept value were .45, and .52, respectively.
  • Correlations between the Howes simple social play scale and overall composite were positive and moderately large correlates of the PVSB mean intercept value (r = .47 and r = .34, respectively)
  • Similar significant correlations were observed between the Howes parallel/mutual scale and the overall composite and the PNVSB scale (r = .44 and r = .33, respectively).

View Criterion Validity Correlations Linking the Vineland, Howes, and ESI (Positive Verbal, Nonverbal, and Total Social Composites)

Is the ESI sensitive to age differences in early social behavior?

In general, children in the older cohorts produced more of each positive social behavior or composite compared to children in the younger cohorts (see table below for details).

  • Change in the focus (adult, peer, nondirected) of social behavior. In general, positive social behavior was directed most often to the adult, next to either the peer or both (nondirected). The least frequent was behavior directed to the peer. More social behaviors occurred in Cohort 3 compared to Cohort 2 compared to Cohort 1. In Cohort 1 compared to 2 and 3, positive social responding was increasingly directed to all three foci over nine months.
  • Changes in positive nonverbal and verbal social behavior. Similar analyses across age cohorts showed growth in positive verbal (PVSB), positive nonverbal (PNVSB), and total social (TSB) behaviors.

View table of age differences in key skills by age cohort (PDF file)

Is the Total Social Composite sensitive to changes in Key Skill?


  • Children in the sample produced a mean rate of 5.04 (SD = 2.7, N = 57) social behaviors (positive and negative) per minute or a total of 30.2 (SD = 16.1) responses in 6 minutes with considerable variation across children.
  • Negative responding occurred rarely, if at all at 0.02, SD = 0.28 (negative) versus 30.22, SD = 15.82 (positive) total responses in 6 minutes.
  • The occurrence of positive verbal (M = 2.39 responses/min) versus positive nonverbal (M = 2.65 responses/min) were nearly equal with nonverbal only slightly more frequently than verbal social behaviors.
  • Children were more likely to direct their positive social behavior to the adult, to either or both (nondirected), or the peer.
  • In the case of the adult (2.09 per min versus 1.37 per min) and peer (0.34 per min versus 0.11 per min), children were more likely to interact nonverbally than verbally;
  • This was just the opposite of nondirected interactions, where children were more likely to produce verbal than nonverbal nondirected social behaviors (0.91 per min versus 0.21 per min).

View trends in key skill elements over time

Is the ESI sensitive to growth over time?

View the ESI Growth Chart

Analysis of PVSB indicated significant mean intercept and slope. The mean intercept at 36 months of age was 19.9 Positive Verbal Social Behaviors in 6 minutes. The mean slope was 0.61 per month. The mean rate of growth was 0.61 verbal social behaviors per month of age or 0.15 per week.

Overlaid on these benchmarks in the ESI growth chart are the actual trajectories of the five children identified at risk for a social developmental delay on the Vineland (-2.0 SD below the mean). As can be seen, these children (i.e., children 528, 538, 905, 906, 907) consistently fell below the mean PVSB growth trajectory and at times below the -1.0 SD trajectory benchmark. And, slopes for these children time were typically flat or declining over time.

View table of ESI Total Social Normative Values

Analysis of Total Social indicated significant mean intercept (mean at 36 months of age) and slope indicating growth over time. The mean intercept at 36 months of age was 38.5 social behaviors in 6 minutes (or 6.4 behaviors per minute). The mean slope was 0.89 per month (or 0.15 per minute). Children on average were adding about 1 new behavior per 6 minute session per month. Children produced from 6 to 7 social behaviors per minute.