Spelling Corrector in Factor

A while ago, Peter Norvig posted a lucid explanation of how to write a spelling corrector. He did it in an impressive 21 lines of readable Python code. Here is my attempt at the same spelling corrector using the Factor programming language. It’s 19 lines if you don’t count the unusually verbose USING statement at the top.

USING: ascii assocs combinators combinators.short-circuit
concurrency.combinators histogram io io.encodings.ascii io.files
kernel locals math math.ranges math.statistics regexp sequences
sequences.deep sets strings values vectors ;
IN: spelling
VALUE: NWORDS
CONSTANT: alphabet "abcdefghijklmnopqrstuvwxyz" 

: words ( string -- seq ) >lower R/ [a-z]+/ all-matching-slices [ >string ] map ;
: train ( file -- ) ascii [ contents words histogram to: NWORDS ] with-file-reader ;

: deletes ( str -- seq ) [ length ] [ [ remove-nth ] curry ] bi map ;
: transposes ( str -- seq )
    [ length 1 - ] [ [ swap cut 1 cut 1 cut surround append ] curry ] bi { } map-as ;

:: replaces ( str -- seq )
    str length [ dup [ str remove-nth insert-nth ] 2curry alphabet swap { } map-as ] map flatten ;

:: inserts ( str -- seq )
    str length [0,b] [ [ str insert-nth ] curry alphabet swap { } map-as ] map flatten ;

: edits1 ( str -- seq )
    { [ deletes ] [ transposes ] [ replaces ] [ inserts ] } cleave union union union ;

: known ( seq -- seq/f ) [ NWORDS at ] filter [ f ] when-empty ;

: known_edits2 ( str -- seq ) edits1 [ edits1 ] map flatten prune known ;

: best ( seq -- str/f ) dup [ NWORDS at ] map dup minmax nip swap index swap nth ;

: correct ( str -- str )
    dup { [ 1vector known ] [ edits1 known ] [ known_edits2 ] } 1|| [ best nip ] when* ;
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s