KT Learning Lab 2: A Conceptual Overview
A broad framework for knowledge tracing models based on logistic regression (Pavlik, Eglington, & Harrell-Williams, 2021)

First member of the LKT family that ran in real-time (Pavlik et al., 2009)

Measures how much latent skill a student has, while they are learning
But expresses it in terms of probability of correctness, the next time the skill is encountered
No direct expression of the amount of latent skill, except this probability of correctness
Assess a student’s knowledge of topic X
Based on a sequence of items that are dichotomously scored
Where the student can learn on each item, due to help, feedback, scaffolding, etc.
Each item may involve multiple latent skills or knowledge components
Each skill has success learning rate γ and failure learning rate ρ
There is also a difficulty parameter β, but its semantics can vary – more on this later
From these parameters, and the number of successes and failures the student has had on each relevant skill so far, we can compute the probability P(m) that the learner will get the item correct




γ = 0.2, ρ = 0.1, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
|---|---|---|
| -0.5 | 0.38 | |
γ = 0.2, ρ = 0.1, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
|---|---|---|
| 0 | -0.5 | 0.38 | 
| -0.5+(0.1)*1 | 
γ = 0.2, ρ = 0.1, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
|---|---|---|
| 0 | -0.5 | 0.38 | 
| -0.4 | 0.40 | 
γ = 0.2, ρ = 0.1, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
|---|---|---|
| 0 | -0.5 | 0.38 | 
| 0 | -0.4 | 0.40 | 
γ = 0.2, ρ = 0.1, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
|---|---|---|
| 0 | -0.5 | 0.38 | 
| 0 | -0.4 | 0.40 | 
| -0.5+(0.1*2) | 
γ = 0.2, ρ = 0.1, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
|---|---|---|
| 0 | -0.5 | 0.38 | 
| 0 | -0.4 | 0.40 | 
| -0.3 | 0.43 | 
γ = 0.2, ρ = 0.1, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
|---|---|---|
| 0 | -0.5 | 0.38 | 
| 0 | -0.4 | 0.40 | 
| 1 | -0.3 | 0.43 | 
| -0.5+(0.1*2)+(0.2*1) | 
γ = 0.2, ρ = 0.1, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
|---|---|---|
| 0 | -0.5 | 0.38 | 
| 0 | -0.4 | 0.40 | 
| 1 | -0.3 | 0.43 | 
| -0.1 | 0.48 | 
Represent when the student learns from an opportunity to practice?
As opposed to just better predicted performance because you’ve gotten it right
γ = success learning rate, ρ = failure learning rate, β = difficulty
γ = success learning rate, ρ = failure learning rate, β = difficulty
Three degenerate cases
γ < 0
γ < ρ
γ = ρ = 0
γ = success learning rate, ρ = failure learning rate, β = difficulty
When might you legitimately get them?
ρ < 0
γ < ρ
γ < 0
γ = success learning rate, ρ = failure learning rate, β = difficulty
Three degenerate cases
γ < 0
γ < ρ
γ = ρ = 0
One seemingly degenerate (but not) case
“It is worth noting that a fourth case when ρ > 0 – is not degenerate, due to the multiple functions the parameters perform in PFA. In this case, the rate of learning the skill may outweigh the evidence of lack of student knowledge that an incorrect answer provides. So long as γ > ρ, a positive ρ is conceptually acceptable.”
γ = -0.1, ρ = -0.5, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
|---|---|---|
| 0 | -0.5 | 0.38 | 
| 0 | -1 | 0.27 | 
| 1 | -1.5 | 0.18 | 
| -1.6 | 0.17 | 
γ = 0.1, ρ = 0.2, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
| 0 | -0.5 | 0.38 | 
| 0 | -0.3 | 0.43 | 
| 1 | -0.1 | 0.48 | 
| 0 | 0.5 | 
γ = success learning rate, ρ = failure learning rate, β = difficulty
Values of ρ below 0 don’t actually mean negative learning
They mean that failure provides more evidence on lack of knowledge
Than the learning opportunity causes improvement
γ = success learning rate, ρ = failure learning rate, β = difficulty
Simply bound γ and ρ
Does not reduce model performance substantially (just like BKT)
What causes degeneracy? We’ll come back to this in a minute
Parameters in PFA combine information from correctness with improvement from practice improvement
Makes PFA models a little harder to interpret than BKT
γ = 0.2, ρ = 0.1, β = -0.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
| 0 | -0.5 | 0.38 | 
| 0 | -0.4 | 0.40 | 
| 1 | -0.3 | 0.43 | 
| -0.1 | 0.48 | 
γ = 0.2, ρ = 0.1, β = -1.5
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
| 0 | -1.5 | 0.18 | 
| 0 | -1.4 | 0.20 | 
| 1 | -1.3 | 0.21 | 
| -1.1 | 0.25 | 
γ = 0.2, ρ = 0.1, β = +3.0
γ = success learning rate, ρ = failure learning rate, β = difficulty
| Actual | m | P(m) | 
| 0 | 3.0 | 0.953 | 
| 0 | 3.1 | 0.957 | 
| 1 | 3.2 | 0.961 | 
| 3.4 | 0.968 | 
β = difficulty
Pavlik proposes three different β Parameters
Item
Item-Type
Skill
Result in different number of parameters
What are the circumstances where you might want item versus skill?
γ = success learning rate, ρ = failure learning rate, β = difficulty
If β is used at the Skill or Item-Type level
And the learning system moves students from easier to harder items within a “skill”
Then γ < 0.
Also, if items are tagged with multiple skills, shared variance (collinearity) between skills could produce degenerate parameters.
Starts with initial values for each parameter
Estimates student correctness at each problem step
Estimates params using student correctness estimates
If goodness is substantially better than last time it was estimated, and max iterations has not been reached, go to step 2
EM is vulnerable to local minima
Randomized restart typically used
Approximately equal predictive power across a lot of studies (Pavlik et al., 2009; Gong et al., 2010; Baker et al., 2011; Pardos et al., 2011, 2012)
Different virtues and flaws – choose the one that better fits your goals
Yes, but far fewer learning systems than BKT
Maier et al. (2021) discuss its use in Reveal Math 1
One issue in real-world use is handling rare skills, which can impact model inferences on common skills as well
Maier et al. (2021) handle this by creating a “catch all” skill for rare skills
Using average parameters from all common skills also works
PFA is a competitor for measuring student skill, which predicts the probability of correctness rather than latent knowledge
Can handle multiple KCs for the same item, a big virtue
Weights actions further back in order less strongly
Adds an evidence decay parameter δ
Substitutes

For the previous summation
Very slightly higher AUC (0.003)
Weights actions further back in order less strongly
Looks at proportion of success-failure, weighting by distance in order from current action
Adds an evidence decay parameter b
Adds “ghost practices” before current practice to make math work
Substitutes

For the previous summation
A little higher AUC (0.003-0.027) (Pavlik et al., 2021)
Creates a general framework for variants of PFA

Ongoing work on variants to PFA typically frames itself in terms of LKT components (and proposes additional components)
Examples
Fluctuation in response time(Chu & Pavlik, 2023)
Different models of memory decay and spacing effect(Maier et al., 2023).
Some items have multiple skills
Learning likely to be gradual rather than sudden
Relatively small amounts of data
You want to add new items without refitting the model