Module 6: Correlation Mining

Case Study

Author

LASER Institute

Published

March 24, 2025

Prepare

1. Review the Research

Our first case study is inspired by Rowe et al. (2021). This paper explores the use of Zoombinis, an educational puzzle game, as a tool for assessing implicit computational thinking (CT) practices in learners. By analyzing in-game actions and applying educational data mining techniques, the authors developed automated detectors to measure problem-solving strategies, validating these assessments against external CT measures.

Link is here.

Correlation Mining in this Paper

This study investigates the use of educational data mining and game-based learning analytics to assess implicit computational thinking (CT) in students playing the puzzle game Zoombinis. Rowe et al. (2021) developed automated detectors to analyze gameplay data and identify key CT practices, such as problem decomposition, pattern recognition, abstraction, and algorithm design.

The study aims to answer:

What indicators of implicit CT can be reliably predicted using automated detectors in Zoombinis?
How do in-game CT measures relate to external CT assessments?

By analyzing gameplay logs, Rowe et al. (2021) build machine-learning models to predict students’ CT behaviors, which are then validated against standardized external CT assessments.

Correlation Mining in the Study

1. Purpose of Correlation Mining

The study employs correlation mining to evaluate the relationship between:

In-game CT behaviors (detected via game log analysis), and
External CT assessment scores (collected from standardized post-tests).

By computing Pearson correlations, the researchers measure how strongly in-game behaviors predict real-world CT skills.

2. Key Findings from Correlation Mining

A. Relationship Between Problem-Solving Strategies and CT Scores

Correlation analysis shows that:

Effective strategies (Systematic Testing, Full Solution Implementation) are positively correlated with higher CT scores.
Ineffective strategies (Trial and Error, Acting Inconsistently with Evidence) are negatively correlated with CT performance.

CT Strategy	Correlation with CT Scores
Systematic Testing	0.12 – 0.18 (positive)
Implementing Full Solution	0.20 – 0.24 (positive)
Trial and Error	-0.09 – -0.18 (negative)
Acting Inconsistent with Evidence	-0.11 – -0.24 (negative)

Key Insight: Players who systematically tested hypotheses and implemented solutions efficiently performed better in external CT assessments.

B. Correlation Between Gameplay Efficiency & CT Scores

Players demonstrating efficient gameplay mechanics had better CT scores, while struggling players (e.g., those repeatedly failing to recognize patterns) performed worse.

Gameplay Behavior	Correlation with CT Scores
Highly Efficient Gameplay	0.18 – 0.24 (positive)
Learning Game Mechanics (slow learning)	-0.20 – -0.25 (negative)

Key Insight: More efficient players exhibited stronger CT skills, whereas those struggling with game mechanics scored lower on post-tests.

C. Correlation of Implicit Algorithmic Strategies

Specific game strategies showed different levels of association with external CT scores.

Puzzle	Strategy	Correlation (r-value)
Pizza Pass	One at a Time	0.13 (positive)
	Winnowing	-0.20 (negative)
Mudball Wall	Maximizing Dots	0.25 (positive)
	Alternating Color & Shape	0.13 (positive)

Key Insight:

The “Maximizing Dots” strategy was the strongest predictor of CT success.
The “Winnowing” strategy had a negative correlation, possibly due to inefficient trial-and-error play.

3. Model Performance & Predictive Accuracy

The researchers built automated detectors to identify in-game CT behaviors. These models were evaluated using AUC (Area Under Curve) scores, which measure prediction accuracy.

Puzzle	Performing Detector	AUC Score
Pizza Pass	One at a Time Strategy	0.92
	Highly Efficient Gameplay	0.91
Mudball Wall	Maximizing Dots	0.89
	Implementing Full Solution	0.88
Allergic Cliffs	Pattern Recognition	0.81

Key Insight:

The “One at a Time” strategy and “Maximizing Dots” were the most reliably detected CT behaviors.
The automated models were highly effective, with AUC scores exceeding 0.80, indicating strong predictive power.

Conclusion

Correlation mining confirmed that Zoombinis gameplay data reflects real-world CT abilities.
Effective in-game strategies (e.g., systematic testing) correlated with better CT scores.
Detectors accurately predicted CT behaviors, validating the use of game-based assessments.

2. Correlation Mining

We will use simulated dataset with gameplay behaviors and CT scores.

Step 1: Import Required Libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score
from scipy.stats import pearsonr
from statsmodels.stats.multitest import multipletests

This imports essential libraries for data, visualization, correlation, and model evaluation.

Step 2: Generate Simulated Gameplay Data

Since actual gameplay data is unavailable, we create simulated data for 500 students, including:

CT strategies (e.g., Systematic Testing, Trial and Error)
Gameplay efficiency (e.g., Highly Efficient, Acting Inconsistent with Evidence)
CT assessment scores (external standardized test results)

# Set random seed for reproducibility
np.random.seed(42)

# Number of students
num_students = 500  

# Simulated gameplay behaviors
data = pd.DataFrame({
    'Trial_and_Error': np.random.randint(0, 10, num_students), 
    'Systematic_Testing': np.random.randint(0, 10, num_students), 
    'Implementing_Full_Solution': np.random.randint(0, 10, num_students),
    'Highly_Efficient_Gameplay': np.random.randint(0, 10, num_students),
    'Acting_Inconsistent': np.random.randint(0, 10, num_students),
    
    # Simulated External CT Scores
    'CT_Score': np.random.randint(50, 100, num_students)  
})

# Display first few rows
print(data.head())

   Trial_and_Error  Systematic_Testing  Implementing_Full_Solution  \
0                6                   8                           0   
1                3                   0                           7   
2                7                   0                           3   
3                4                   3                           3   
4                6                   8                           4   

   Highly_Efficient_Gameplay  Acting_Inconsistent  CT_Score  
0                          5                    9        78  
1                          3                    5        86  
2                          3                    6        69  
3                          3                    8        98  
4                          1                    0        70

Step 3: Compute Pearson Correlations

Now, we compute Pearson correlation coefficients between gameplay behaviors and CT scores to analyze their relationships and apply Benjamini-Hochberg correction.

# Compute correlations and p-values
correlations={}
p_values={}
for column in data.columns[:-1]:  # Exclude CT_Score
    corr, p_value = pearsonr(data[column], data['CT_Score'])
    correlations[column] = corr
    p_values[column] = p_value

# Convert to DataFrame
corr_df = pd.DataFrame({'Correlation': correlations, 'p_value': p_values})

import numpy as np

def benjamini_hochberg(p_values):
    n = len(p_values)
    
    # Pair each p-value with its index and sort by p-value
    sorted_pvals = sorted(enumerate(p_values), key=lambda x: x[1])
    
    # Initialize result arrays for both original and sorted views
    result_original = np.zeros((n, 3), dtype=object)
    result_sorted = np.zeros((n, 3), dtype=object)
    
    # Fill result arrays with p-values
    for i, (orig_index, pval) in enumerate(sorted_pvals):
        result_sorted[i, 0] = pval  # Sorted p-values
        result_original[orig_index, 0] = pval  # Original p-values
    
    # Apply the Benjamini-Hochberg procedure
    for rank, (orig_index, pval) in enumerate(sorted_pvals, start=1):
        alpha = (0.05 * rank) / n
        
        # Fill alpha values
        result_sorted[rank - 1, 1] = alpha
        result_original[orig_index, 1] = alpha
        
        # Mark significance
        significance = 'Significant' if pval <= alpha else 'Not Significant'
        result_sorted[rank - 1, 2] = significance
        result_original[orig_index, 2] = significance
    
    return result_original, result_sorted
  
result_original, result_sorted  = benjamini_hochberg(p_values.values())
print("=== Original Order ===")
print("P-value | Alpha | Significance")
for row in result_original:
    print(f"{row[0]:.5f} | {row[1]:.5f} | {row[2]}")

print("\n=== Sorted by P-value ===")
print("P-value | Alpha | Significance")
for row in result_sorted:
    print(f"{row[0]:.5f} | {row[1]:.5f} | {row[2]}")

=== Original Order ===
P-value | Alpha | Significance
0.66029 | 0.03000 | Not Significant
0.80485 | 0.04000 | Not Significant
0.55144 | 0.02000 | Not Significant
0.96163 | 0.05000 | Not Significant
0.11517 | 0.01000 | Not Significant

=== Sorted by P-value ===
P-value | Alpha | Significance
0.11517 | 0.01000 | Not Significant
0.55144 | 0.02000 | Not Significant
0.66029 | 0.03000 | Not Significant
0.80485 | 0.04000 | Not Significant
0.96163 | 0.05000 | Not Significant

3. Reference

Rowe, E., Almeda, M. V., Asbell-Clarke, J., Scruggs, R., Baker, R., Bardar, E., & Gasca, S. (2021). Assessing implicit computational thinking in zoombinis puzzle gameplay. Computers in Human Behavior, 120, 106707.