Module 2: Logistic Knowledge Tracing and Performance Factors Analysis

KT Learning Lab 2: A Conceptual Overview

Logistic Knowledge Tracing

A broad framework for knowledge tracing models based on logistic regression (Pavlik, Eglington, & Harrell-Williams, 2021)

Performance Factors Analysis

First member of the LKT family that ran in real-time (Pavlik et al., 2009)

PFA

Measures how much latent skill a student has, while they are learning
- But expresses it in terms of probability of correctness, the next time the skill is encountered
- No direct expression of the amount of latent skill, except this probability of correctness

What is the typical use of PFA?

Assess a student’s knowledge of topic X
Based on a sequence of items that are dichotomously scored
- E.g. the student can get a score of 0 or 1 on each item
Where the student can learn on each item, due to help, feedback, scaffolding, etc.

How does PFA differ from BKT?

Key assumptions

Each item may involve multiple latent skills or knowledge components
- Different from BKT
Each skill has success learning rate γ and failure learning rate ρ
- Different from BKT where learning rate is the same, success or failure

Key assumptions

There is also a difficulty parameter β, but its semantics can vary – more on this later
From these parameters, and the number of successes and failures the student has had on each relevant skill so far, we can compute the probability P(m) that the learner will get the item correct

PFA

Let’s go over what each of these parameters means

Reasonable Example

γ = 0.2, ρ = 0.1, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
	-0.5	0.38

Reasonable Example

γ = 0.2, ρ = 0.1, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
	-0.5+(0.1)*1

Reasonable Example

γ = 0.2, ρ = 0.1, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
	-0.4	0.40

Reasonable Example

γ = 0.2, ρ = 0.1, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
0	-0.4	0.40

Reasonable Example

γ = 0.2, ρ = 0.1, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
0	-0.4	0.40
	-0.5+(0.1*2)

Reasonable Example

γ = 0.2, ρ = 0.1, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
0	-0.4	0.40
	-0.3	0.43

Reasonable Example

γ = 0.2, ρ = 0.1, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
0	-0.4	0.40
1	-0.3	0.43
	-0.5+(0.12)+(0.21)

Reasonable Example

γ = 0.2, ρ = 0.1, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
0	-0.4	0.40
1	-0.3	0.43
	-0.1	0.48

How Does PFA

Represent when the student learns from an opportunity to practice?
As opposed to just better predicted performance because you’ve gotten it right

How Does PFA

γ = success learning rate, ρ = failure learning rate, β = difficulty

Represent when the student learns from an opportunity to practice?
As opposed to just better predicted performance because you’ve gotten it right
Is it ρ ?
Is it average of ρ and γ?

Degeneracy in PFA Maier, Baker, & Stalzer (2021)

γ = success learning rate, ρ = failure learning rate, β = difficulty

Three degenerate cases
- γ < 0
- γ < ρ
- γ = ρ = 0

What do each of these mean?

γ = success learning rate, ρ = failure learning rate, β = difficulty

When might you legitimately get them?
ρ < 0
γ < ρ
γ < 0

Degeneracy in PFA Maier et al. (2021)

γ = success learning rate, ρ = failure learning rate, β = difficulty

Three degenerate cases
- γ < 0
- γ < ρ
- γ = ρ = 0
One seemingly degenerate (but not) case
- ρ > 0
“It is worth noting that a fourth case when ρ > 0 – is not degenerate, due to the multiple functions the parameters perform in PFA. In this case, the rate of learning the skill may outweigh the evidence of lack of student knowledge that an incorrect answer provides. So long as γ > ρ, a positive ρ is conceptually acceptable.”

Degenerate Example (Case 1)

γ = -0.1, ρ = -0.5, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
0	-1	0.27
1	-1.5	0.18
	-1.6	0.17

Degenerate Example (Case 2)

γ = 0.1, ρ = 0.2, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
0	-0.3	0.43
1	-0.1	0.48
	0	0.5

Note

γ = success learning rate, ρ = failure learning rate, β = difficulty

Values of ρ below 0 don’t actually mean negative learning
They mean that failure provides more evidence on lack of knowledge
Than the learning opportunity causes improvement

Addressing Degeneracy Maier et al. (2021)

γ = success learning rate, ρ = failure learning rate, β = difficulty

Simply bound γ and ρ
Does not reduce model performance substantially (just like BKT)
What causes degeneracy? We’ll come back to this in a minute

Note

Parameters in PFA combine information from correctness with improvement from practice improvement
Makes PFA models a little harder to interpret than BKT

Adjusting β

γ = 0.2, ρ = 0.1, β = -0.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-0.5	0.38
0	-0.4	0.40
1	-0.3	0.43
	-0.1	0.48

Adjusting β

γ = 0.2, ρ = 0.1, β = -1.5

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	-1.5	0.18
0	-1.4	0.20
1	-1.3	0.21
	-1.1	0.25

Adjusting β

γ = 0.2, ρ = 0.1, β = +3.0

γ = success learning rate, ρ = failure learning rate, β = difficulty

Actual	m	P(m)
0	3.0	0.953
0	3.1	0.957
1	3.2	0.961
	3.4	0.968

β Parameters

β = difficulty

Pavlik proposes three different β Parameters
- Item
- Item-Type
- Skill
Result in different number of parameters
- And greater or lesser potential concern about over-fitting
What are the circumstances where you might want item versus skill?

Causes of Degeneracy Maier et al. (2021)

γ = success learning rate, ρ = failure learning rate, β = difficulty

If β is used at the Skill or Item-Type level
And the learning system moves students from easier to harder items within a “skill”
Then γ < 0.
Also, if items are tagged with multiple skills, shared variance (collinearity) between skills could produce degenerate parameters.

Fitting PFA

Typically Expectation Maximization is used

Expectation Maximization

Starts with initial values for each parameter
Estimates student correctness at each problem step
Estimates params using student correctness estimates
If goodness is substantially better than last time it was estimated, and max iterations has not been reached, go to step 2

Expectation Maximization

EM is vulnerable to local minima
Randomized restart typically used

Is PFA better than BKT?

Approximately equal predictive power across a lot of studies (Pavlik et al., 2009; Gong et al., 2010; Baker et al., 2011; Pardos et al., 2011, 2012)
Different virtues and flaws – choose the one that better fits your goals

Is PFA used in the real world?

Yes, but far fewer learning systems than BKT
Maier et al. (2021) discuss its use in Reveal Math 1

Using PFA in the real world

One issue in real-world use is handling rare skills, which can impact model inferences on common skills as well
- Because PFA is used in cases with items tagged to multiple skills
Maier et al. (2021) handle this by creating a “catch all” skill for rare skills
Using average parameters from all common skills also works

Final Thoughts on (original)PFA

PFA is a competitor for measuring student skill, which predicts the probability of correctness rather than latent knowledge
Can handle multiple KCs for the same item, a big virtue

Beyond PFA

Can we improve PFA based on what the field knows about learner memory?

PFA-Decay Gong, Beck, & Heffernan (2011)

Weights actions further back in order less strongly
Adds an evidence decay parameter δ
Substitutes
For the previous summation
Very slightly higher AUC (0.003)

R-PFA Galyardt & Goldin (2014)

Weights actions further back in order less strongly
Looks at proportion of success-failure, weighting by distance in order from current action
Adds an evidence decay parameter b
Adds “ghost practices” before current practice to make math work
Substitutes
For the previous summation
A little higher AUC (0.003-0.027) (Pavlik et al., 2021)

LKT Pavlik, Eglington, & Harrell-Williams (2021)

Creates a general framework for variants of PFA

Features supported by LKT package

LKT (Pavlik et al., 2021)

Ongoing work on variants to PFA typically frames itself in terms of LKT components (and proposes additional components)
Examples
- Fluctuation in response time(Chu & Pavlik, 2023)
- Different models of memory decay and spacing effect(Maier et al., 2023).

When to use LKT

Some items have multiple skills
Learning likely to be gradual rather than sudden
Relatively small amounts of data
You want to add new items without refitting the model

References

Galyardt, A., & Goldin, I. (2014). Recent-performance factors analysis. Educational Data Mining 2014.

Gong, Y., Beck, J. E., & Heffernan, N. T. (2011). How to construct more accurate student models: Comparing and optimizing knowledge tracing and performance factor analysis. International Journal of Artificial Intelligence in Education, 21(1-2), 27–46.

Maier, C., Baker, R. S., & Stalzer, S. (2021). Challenges to applying performance factor analysis to existing learning systems. Proceedings of the 29th International Conference on Computers in Education.

Pavlik, P. I., Eglington, L. G., & Harrell-Williams, L. M. (2021). Logistic knowledge tracing: A constrained framework for learner modeling. IEEE Transactions on Learning Technologies, 14(5), 624–639.