If person X buys diapers, Person X buys beer
Purchases occur at the same time
If person X takes Intro Stats now, Person X takes Advanced Data Mining in a later semester
Conclusion: recommend Advanced Data Mining to students who have previously taken Intro Stats
Doesn’t matter if they take other courses in between
Learners in virtual environments have different sequences of behavior depending on their degree of self-regulated learning
High self-regulated learning: Tend to gather information and then immediately record it carefully
Low self-regulated learning: Tend to gather more information without pausing to record it(Sabourin, Mott, & Lester, 2011)
If-then elements do not need to occur in the same data point
Instead
If-then elements should involve the same student (or other organizing variable, like teacher or school)
If elements can be within a certain time window of each other
Then element time should be within a certain window after if time
Support calculated as number of sequences that contain subsequence, divided by total number of sequences
Classic Algorithm for SPM
Bob: {GAMING and BORED,
OFF-TASK and BORED,
ON-TASK and BORED,
GAMING and BORED,
GAMING and FRUSTRATED,
ON-TASK and BORED}
Bob: {GAMING and BORED 5:05:20,
OFF-TASK and BORED 5:05:40,
ON-TASK and BORED 5:06:00,
GAMING and BORED 5:06:20,
GAMING and FRUSTRATED 5:06:40,
ON-TASK and BORED 5:07:00}
Take the whole set of sequences of length 1
Find which sequences of length 1 have support over pre-chosen threshold
Compose potential sequences out of pairs of sequences of length 1 with acceptable support
Find which sequences of length 2 have support over pre-chosen threshold
Compose potential sequences out of triplets of sequences of length 1 and 2 with acceptable support
Continue until no new sequences found
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac (14/40=35%)
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae,
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae, aad
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae, aad
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae, aad
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae, aad
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae, aad
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae, aad
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae, aad
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae, aad
With min support = 20%
Chuck: a, abc, ac, de, cef
Darlene: af, ab, acd, dabc, ef
Egoberto: aef, ab, aceh, d, ae
Francine: a, bc, acf, d, abeg
a, b, c, d, e, f, ac, ad, ae, aad, aae, ade
From
ac, ad, ae, aad, aae, ade
To
a ➔ c, a ➔ d, a ➔ e, a ➔ ad, a ➔ ae, ad ➔ e
Free-Span
Prefix-Span
Spade
Faster, but same basic idea as in GPS
Compares the support for sequential patterns between two groups
Such as high-performing and low-performing students
To find the patterns that are much more common in one group than the other
Some contemporary algorithms attempt to find groups of patterns that can be described concisely
See review in Fournier-Viger et al. (2017)
Related algorithm
Rather than just finding small, local patterns
Tries to find overarching processes that occur over the course of a set of events, or tries to find discrepancies in approved processes
Comments? Questions?