ColumnVectorizer
class creating sparse matrices from a corpus
Constructor Summary
Public Constructor | ||
public |
constructor(options: Object): this creates a new instance for classifying text data for machine learning |
Member Summary
Public Members | ||
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
Method Summary
Public Methods | ||
public |
evaluate(testString: String): number[][] returns new matrix of words with counts in columns |
|
public |
evaluateString(testString: String): Object returns word map with counts |
|
public |
fit_transform(options: Object) Fits and transforms data by creating column vectors (a sparse matrix where each row has every word in the corpus as a column and the count of appearances in the corpus) |
|
public |
get_limited_features(options: *) Returns limited sets of dependent features or all dependent features sorted by word count |
|
public |
get_tokens(): String[] Returns a distinct array of all tokens |
|
public |
get_vector_array(): String[] Returns array of arrays of strings for dependent features from sparse matrix word map |
Public Constructors
public constructor(options: Object): this source
creates a new instance for classifying text data for machine learning
Params:
Name | Type | Attribute | Description |
options | Object |
|
Return:
this |
Example:
const dataset = new ms.nlp.ColumnVectorizer(csvData);
Public Members
public data source
public limitedFeatures source
public matrix source
public maxFeatures source
public replacer source
public sortedWordCount source
public tokens source
public vectors source
public wordCountMap source
public wordMap source
Public Methods
public evaluate(testString: String): number[][] source
returns new matrix of words with counts in columns
Params:
Name | Type | Attribute | Description |
testString | String |
Return:
number[][] | sparse matrix row for new classification predictions |
Example:
ColumnVectorizer.evaluate('I would rate everything Great, views Great, food Great') => [ [ 0, 1, 3, 0, 0, 0, 0, 0, 1 ] ]
public evaluateString(testString: String): Object source
returns word map with counts
Params:
Name | Type | Attribute | Description |
testString | String |
Return:
Object | object of corpus words with accounts |
Example:
ColumnVectorizer.evaluateString('I would rate everything Great, views Great, food Great') => { realli: 0,
good: 0,
definit: 0,
recommend: 0,
wait: 0,
staff: 0,
rude: 0,
great: 3,
view: 1,
food: 1,
not: 0,
cold: 0,
took: 0,
forev: 0,
seat: 0,
time: 0,
prompt: 0,
attent: 0,
bland: 0,
flavor: 0,
kind: 0 }
public fit_transform(options: Object) source
Fits and transforms data by creating column vectors (a sparse matrix where each row has every word in the corpus as a column and the count of appearances in the corpus)
Params:
Name | Type | Attribute | Description |
options | Object | ||
options.data | Object[] | array of corpus data |
public get_limited_features(options: *) source
Returns limited sets of dependent features or all dependent features sorted by word count
Params:
Name | Type | Attribute | Description |
options | * | ||
options.maxFeatures | number | max number of features |
public get_tokens(): String[] source
Returns a distinct array of all tokens
Return:
String[] | returns a distinct array of all tokens |
public get_vector_array(): String[] source
Returns array of arrays of strings for dependent features from sparse matrix word map
Return:
String[] | returns array of dependent features for DataSet column matrics |