Module 5: Memory Algorithms

Is future correctness enough?

Up until this point we’ve been talking about predicting future correctness

Up to this point, we’ve been talking about predicting future correctness. But is future correctness enough?

But what if you forget it tomorrow?

Another way to look at knowledge is – how long will you remember it?

Relevant for all knowledge

Mostly considered in the context of memory for facts, rather than skills
How do you say banana in Spanish?
What is the capital of New York?
Where are the Islets of Langerhans?

Most Common Application Areas

Flashcard apps
Language learning apps

Spacing Effect

It has long been known that spaced practice (i.e. pausing between studying the same fact) is better than massed practice (i.e. cramming)
Early adaptive systems implemented this behavior in simple ways (i.e. Leitner, 1972)

ACT-R Memory Equations (Pavlik & Anderson, 2005)

Memory duration can be understood in terms of memory strength (referred to as activation)

ACT-R Memory Equations (Pavlik & Anderson, 2005)

Formula for probability of remembering

\[ P(m) = \frac{1}{1+e^{\frac{\tau-m}{s}}} \]

Where m = activation strength of current fact
t = threshold parameter for how hard it is to remember
s is noise parameter for how sensitive memory is to changes in activation
Note logistic function (like PFA)

ACT-R Memory Equations (Pavlik & Anderson, 2005)

Formula for activation

\[ m_{n}(t_{1..n}) = \ln ()\sum_i^n t_{i}^{-d} \]

We have a sequence of n cases where the learner encountered the fact
Each 𝑡_𝑖 represents how long ago the learner encountered the fact for the i-th time
The decay parameter d represents the speed of forgetting under exponential decay

ACT-R Memory Equations (Pavlik & Anderson, 2005)

Implications
More practice = better memory
More time between practices = better memory
Most efficient learning comes from dense practice followed by expanding amounts of time in between practices (Pavlik & Anderson, 2008)

MCM (Mozer et al., 2009)

Postulates that decay speed drops, the more times a fact is encountered
Functionally complex model where
- Knowledge strength (and therefore probability of remembering) is a function of the sum of the traces’ actual contributions, divided by the product of their potential contributions
- Power function is estimated as a combination of exponential functions

A more recent competitor to ACT-R memory equations is MCM by Mozer et al., (2009) and his colleagues. This model postulates that the decay speed drops, the more times the facts are encountered. So in ACT-R, the decay speed is constant whether you’ve encountered something one time or a million times.

But in MCM, the more times you’ve encountered a fact, the slower it is to decay.

MCM is represented by a functionally complex model where knowledge strength, and therefore the probability of remembering, is a function of the sum of the traces’ actual contributions divided by the product of their potential contributions. A power function is estimated as a combination of exponential functions. Each encounter with the knowledge has an exponential function for decay, but it turns out to sum up to a power function.

DASH (Mozer & Lindsay, 2016)

DASH Extends previous approaches to also include item difficulty and latent student ability
Can use either MCM or ACT-R as its internal representation of how memory decays over time

Duolingo (Settles & Mercer, 2016)

Fits regression model to predict both recall and estimated half-life of memory (based on lag time)
Based on estimate of exponential decay of memory

Duolingo (Settles & Mercer, 2016)

Uses feature set including
- Time since word last seen
- Total number of times student has seen the word
- Total number of times student has correctly recalled the word
- Total number of times student has failed to recalled the word Word difficulty

Another Key Memory Phenomenon

Spreading Activation
- Encountering or recalling something in memory also increases memory activation of related concepts/facts/ideas (Anderson, 1983)
- Ma, Hettiarachchi, Fukui, & Ando (2023) build a DKT-family algorithm for memory that uses associations between items along these lines

And of course…

Remember what we talked about earlier this week on integrating time into DKT-family algorithms and LKT-family algorithms

When to use memory models

You care about memory for specific items
- If you care about memory for skills, see LKT extensions that include time
Forgetting is a real concern – the student can do it today, not tomorrow
Relatively small amounts of data OK
Once you have a memory model, you can safely add new items to it and it will work
- Many algorithms don’t have item-specific parameters at all

Module 5: Memory Algorithms

Is future correctness enough?

But what if you forget it tomorrow?

Relevant for all knowledge

Most Common Application Areas

Spacing Effect

ACT-R Memory Equations (Pavlik & Anderson, 2005)

ACT-R Memory Equations (Pavlik & Anderson, 2005)

ACT-R Memory Equations (Pavlik & Anderson, 2005)

ACT-R Memory Equations (Pavlik & Anderson, 2005)

MCM (Mozer et al., 2009)

DASH (Mozer & Lindsay, 2016)

Duolingo (Settles & Mercer, 2016)

Duolingo (Settles & Mercer, 2016)

Another Key Memory Phenomenon

And of course…

When to use memory models

Questions? Comments?