DataSet
class for manipulating an array of objects, typically from CSV data
Static Method Summary
Static Public Methods | ||
public static |
columnArray(name: string, options: *): array returns a new array of a selected column from an array of objects, can filter, scale and replace values |
|
public static |
columnMatrix(vectors: Array, data: Array): Array returns a matrix of values by combining column arrays into a matrix |
|
public static |
encodeObject(data: Object, options: {labels: Array<String>, prefix: String, name: String}): Object Returns an object into an one hot encoded object |
|
public static |
getBinaryValue(value: String | Number): Number returns 0 or 1 depending on the input value |
|
public static |
getTransforms(transforms: Object): Array<Object> Allows for fit transform short hand notation |
|
public static |
mapToObject(mapObj: Map): Object returns a JavaScript Object from a Map (supports nested Map Objects) |
|
public static |
oneHotDecoder(name: string, options: *): Array<Object> Return one hot encoded data |
|
public static |
oneHotEncoder(name: string, options: *): Object returns a new object of one hot encoded values |
|
public static |
reverseColumnMatrix(options: *): Object[] returns an array of objects by applying labels to matrix of columns |
|
public static |
|
|
public static |
selectColumns(names: String[], options: *): Object[] returns a list of objects with only selected columns as properties |
Constructor Summary
Public Constructor | ||
public |
constructor(dataset: Object[]): this creates a new raw data instance for preprocessing data for machine learning |
Member Summary
Public Members | ||
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
|
public |
|
Method Summary
Public Methods | ||
public |
columnDescale(name: string): number[] Returns a new array of descaled values |
|
public |
columnMerge(name: String, data: Array): Object it returns a new column that is merged onto the data set |
|
public |
columnReducer(name: String, options: Object): Object it returns a new column that reduces a column into a new column object, this is used in data prep to create new calculated columns for aggregrate statistics |
|
public |
columnReplace(name: string, options: *): array | Object[] returns a new array of a selected column from an array of objects and replaces empty values, encodes values and scales values |
|
public |
columnScale(name: string): number[] Returns a new array of scaled values which can be reverse (descaled). |
|
public |
exportFeatures(filter: Function): {labels: Map, encoders: Map, scalers: map} returns Object of all encoders and scalers |
|
public |
filterColumn(filter: Function): Array returns filtered rows of data |
|
public |
fitColumns(): Object[] mutates data property of DataSet by replacing multiple columns in a single command |
|
public |
fitInverseTransforms(options: *) Mutate dataset data by inversing all transforms |
|
public |
fitTransforms(options: *) Mutate dataset data with all transforms |
|
public |
importFeatures(features: {labels: Map, encoders: Map, scalers: map}) set encoders, labels and scalers |
|
public |
inverseTransformObject(data: *, options: *): Object Inverses transform on an object |
|
public |
labelDecode(name: string, options: *): array returns a new array and decodes an encoded column back to the original array values |
|
public |
labelEncoder(name: string, options: *): array returns a new array and label encodes a selected column |
|
public |
oneHotColumnArray(name: string, options: *): Array<Object> Return one hot encoded data |
|
public |
transformObject(data: *, options: *): Object transforms an object and replaces values that have been scaled or encoded |
Static Public Methods
public static columnArray(name: string, options: *): array source
returns a new array of a selected column from an array of objects, can filter, scale and replace values
Params:
Name | Type | Attribute | Description |
name | string | csv column header, or JSON object property name |
|
options | * | ||
options.prefilter | function |
|
prefilter values to return |
options.filter | function |
|
filter values to return |
options.replace.test | function |
|
test function for replacing values (arr[val]) |
options.replace.value | string | number | function |
|
value to replace (arr[val]) if replace test is true, if a function (result,val,index,arr,name)=>your custom value |
options.parseIntBase | number |
|
radix value for parseInt |
options.parseFloat | boolean |
|
convert values to floats |
options.parseInt | boolean |
|
converts values to ints |
options.scale | boolean |
|
standard or minmax feature scale values |
Return:
array |
Example:
//column Array returns column of data by name
// [ '44','27','30','38','40','35','','48','50', '37' ]
const OringalAgeColumn = dataset.columnArray('Age');
public static columnMatrix(vectors: Array, data: Array): Array source
returns a matrix of values by combining column arrays into a matrix
Params:
Name | Type | Attribute | Description |
vectors | Array |
|
array of arguments for columnArray to merge columns into a matrix |
data | Array |
|
array of data to convert to matrix |
Return:
Array | a matrix of column values |
Example:
const csvObj = new DataSet([{col1:1,col2:5},{col1:2,col2:6}]);
csvObj.columnMatrix([['col1',{parseInt:true}],['col2']]); // =>
//[
// [1,5],
// [2,6],
//]
public static encodeObject(data: Object, options: {labels: Array<String>, prefix: String, name: String}): Object source
Returns an object into an one hot encoded object
Params:
Name | Type | Attribute | Description |
data | Object | object to encode |
|
options | {labels: Array<String>, prefix: String, name: String} | encoded object options |
Return:
Object | one hot encoded object |
Example:
const labels = ['apple', 'orange', 'banana',];
const prefix = 'fruit_';
const name = 'fruit';
const options = { labels, prefix, name, };
const data = {
fruit: 'apple',
};
EncodedCSVDataSet.encodeObject(data, options); // => { fruit_apple: 1, fruit_orange: 0, fruit_banana: 0, }
public static getBinaryValue(value: String | Number): Number source
returns 0 or 1 depending on the input value
Params:
Name | Type | Attribute | Description |
value | String | Number |
|
value to convert to a 1 or a 0 |
Return:
Number | 0 or 1 depending on truthiness of value |
Example:
DataSet.getBinaryValue('true') // => 1
DataSet.getBinaryValue('false') // => 0
DataSet.getBinaryValue('No') // => 0
DataSet.getBinaryValue(false) // => 0
public static getTransforms(transforms: Object): Array<Object> source
Allows for fit transform short hand notation
Params:
Name | Type | Attribute | Description |
transforms | Object |
Return:
Array<Object> | returns fit columns, columns property |
Example:
DataSet.getTransforms({
Age: ['scale',],
Rating: ['label',], }); //=> [
// {
// name: 'Age', options: { strategy: 'scale', }, },
// },
// {
// name: 'Rating', options: { strategy: 'label', },
// },
// ];
public static mapToObject(mapObj: Map): Object source
returns a JavaScript Object from a Map (supports nested Map Objects)
Params:
Name | Type | Attribute | Description |
mapObj | Map | Map to convert into JavaScript Object |
Return:
Object | JavaScript Object converted from a Map |
Example:
const csvObj = new DataSet([{col1:1,col2:5},{col1:2,col2:6}]);
csvObj.columnMatrix([['col1',{parseInt:true}],['col2']]); // =>
//[
// [1,5],
// [2,6],
//]
public static oneHotDecoder(name: string, options: *): Array<Object> source
Return one hot encoded data
Params:
Name | Type | Attribute | Description |
name | string | column name |
|
options | * |
Return:
Array<Object> | returns an array of objects from an one hot encoded column |
Example:
const csvData = [{
'Country': 'Brazil',
'Age': '44',
'Salary': '72000',
'Purchased': 'N',
},
{
'Country': 'Mexico',
'Age': '27',
'Salary': '48000',
'Purchased': 'Yes',
},
...
];
const EncodedCSVDataSet = new ms.preprocessing.DataSet(csvData);
EncodedCSVDataSet.fitColumns({
columns: [
{
name: 'Country',
options: { strategy: 'onehot', },
},
],
});
EncodedCSVDataSet.oneHotDecoder('Country);// =>
// [ { Country: 'Brazil' },
// { Country: 'Mexico' },
// { Country: 'Ghana' },
// { Country: 'Mexico' },
// ...]
public static oneHotEncoder(name: string, options: *): Object source
returns a new object of one hot encoded values
Params:
Name | Type | Attribute | Description |
name | string | csv column header, or JSON object property name |
|
options | * |
Return:
Object |
Example:
// [ 'Brazil','Mexico','Ghana','Mexico','Ghana','Brazil','Mexico','Brazil','Ghana', 'Brazil' ]
const originalCountry = dataset.columnArray('Country');
// { originalCountry:
// { Country_Brazil: [ 1, 0, 0, 0, 0, 1, 0, 1, 0, 1 ],
// Country_Mexico: [ 0, 1, 0, 1, 0, 0, 1, 0, 0, 0 ],
// Country_Ghana: [ 0, 0, 1, 0, 1, 0, 0, 0, 1, 0 ] },
// }
const oneHotCountryColumn = dataset.oneHotEncoder('Country');
See:
- http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
public static reverseColumnMatrix(options: *): Object[] source
returns an array of objects by applying labels to matrix of columns
Params:
Name | Type | Attribute | Description |
options | * | ||
options.vectors | Array[] | array of vectors |
|
options.labels | String[] | array of labels |
Return:
Object[] | an array of objects with properties derived from options.labels |
Example:
const data = [{ Age: '44', Salary: '44' },
{ Age: '27', Salary: '27' }]
const AgeDataSet = new MS.DataSet(data);
const dependentVariables = [ [ 'Age', ], [ 'Salary', ], ];
const AgeSalMatrix = AgeDataSet.columnMatrix(dependentVariables); // =>
// [ [ '44', '72000' ],
// [ '27', '48000' ] ];
MS.DataSet.reverseColumnMatrix({vectors:AgeSalMatrix,labels:dependentVariables}); // => [{ Age: '44', Salary: '44' },
{ Age: '27', Salary: '27' }]
public static reverseColumnVector() source
public static selectColumns(names: String[], options: *): Object[] source
returns a list of objects with only selected columns as properties
Params:
Name | Type | Attribute | Description |
names | String[] | array of selected columns |
|
options | * |
Return:
Object[] | an array of objects with properties derived from names |
Example:
const data = [{ Age: '44', Salary: '44' , Height: '34' },
{ Age: '27', Salary: '44' , Height: '50' }]
const AgeDataSet = new MS.DataSet(data);
const cols = [ 'Age', 'Salary' ];
const selectedCols = CSVDataSet.selectColumns(cols); // => [{ Age: '44', Salary: '44' },
{ Age: '27', Salary: '27' }]
Public Constructors
public constructor(dataset: Object[]): this source
creates a new raw data instance for preprocessing data for machine learning
Params:
Name | Type | Attribute | Description |
dataset | Object[] |
Return:
this |
Example:
const dataset = new ms.DataSet(csvData);
Public Members
public columnArray source
public columnMatrix source
public config source
public data source
public encodeObject source
public encoders source
public getTransforms source
public labels source
public oneHotDecoder source
public oneHotEncoder source
public reverseColumnMatrix source
public reverseColumnVector source
public scalers source
public selectColumns source
Public Methods
public columnDescale(name: string): number[] source
Returns a new array of descaled values
Params:
Name | Type | Attribute | Description |
name | string | name - csv column header, or JSON object property name |
|
options.strategy | string |
|
strategy for scaling values |
Return:
number[] | returns an array of scaled values |
Example:
//dataset.columnArray('Age') => [ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]
const scaledData = [ 3.784189633918261,
3.295836866004329, 3.4011973816621555, 3.6375861597263857, 3.6888794541139363, 3.5553480614894135, 3.657847344866208, 3.8712010109078907, 3.912023005428146, 3.6109179126442243 ]
dataset.columnDescale('Age') // => [ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]
public columnMerge(name: String, data: Array): Object source
it returns a new column that is merged onto the data set
Params:
Name | Type | Attribute | Description |
name | String | name of new Column |
|
data | Array | new dataset data |
Return:
Object |
Example:
CSVDataSet.columnMerge('DoubleAge', [ 88, 54, 60, 76, 80, 70, 0, 96, 100, 74 ]); //=> { DoubleAge: [ 88, 54, 60, 76, 80, 70, 0, 96, 100, 74 ] }
public columnReducer(name: String, options: Object): Object source
it returns a new column that reduces a column into a new column object, this is used in data prep to create new calculated columns for aggregrate statistics
Params:
Name | Type | Attribute | Description |
name | String | name of new Column |
|
options | Object | ||
options.columnName | String | name property for columnArray selection |
|
options.columnOptions | Object | options property for columnArray |
|
options.reducer | Function | reducer function to reduce into new array, it should push values into the resulting array |
Return:
Object | a new object that has reduced array as the value |
Example:
const reducer = (result, value, index, arr) => {
result.push(value * 2);
return result;
};
CSVDataSet.columnReducer('DoubleAge', {
columnName: 'Age',
reducer,
}); //=> { DoubleAge: [ 88, 54, 60, 76, 80, 70, 0, 96, 100, 74 ] }
public columnReplace(name: string, options: *): array | Object[] source
returns a new array of a selected column from an array of objects and replaces empty values, encodes values and scales values
Params:
Name | Type | Attribute | Description |
name | string | csv column header, or JSON object property name |
|
options | * | ||
options.empty | boolean |
|
replace empty values |
options.strategy | boolean |
|
strategy for replacing value, any array stat method from ml.js (mean, standardDeviation, median) or (label,labelEncoder,onehot,oneHotEncoder) |
Return:
array | Object[] |
Example:
//column Replace returns new Array with replaced missing data
//[ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]
const ReplacedAgeMeanColumn = dataset.columnReplace('Age',{strategy:'mean'});
public columnScale(name: string): number[] source
Returns a new array of scaled values which can be reverse (descaled). The scaling transformations are stored on the DataSet
Params:
Name | Type | Attribute | Description |
name | string | name - csv column header, or JSON object property name |
|
options.strategy | string |
|
strategy for scaling values |
Return:
number[] | returns an array of scaled values |
Example:
//dataset.columnArray('Age') => [ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]
dataset.columnScale('Age',{strategy:'log'}) // => [ 3.784189633918261,
3.295836866004329, 3.4011973816621555, 3.6375861597263857, 3.6888794541139363, 3.5553480614894135, 3.657847344866208, 3.8712010109078907, 3.912023005428146, 3.6109179126442243 ]
dataset.scalers.get('Age').scale(45) // => 3.8066624897703196
dataset.scalers.get('Age').descale(3.8066624897703196) // => 45
//this supports, log/exponent, minmax/normalization and standardscaling
public exportFeatures(filter: Function): {labels: Map, encoders: Map, scalers: map} source
returns Object of all encoders and scalers
Params:
Name | Type | Attribute | Description |
filter | Function |
|
filter function |
Return:
{labels: Map, encoders: Map, scalers: map} | JavaScript Object of transforms encoders and scalers(labels, encoders, scalers) |
Example:
const csvObj = new DataSet([{col1:1,col2:5},{col1:false,col2:6}]);
DataSet.fitColumns({col1:['label',{binary:true}]});
Dataset.data // => [{col1:true,col2:5},{col1:false,col2:6}]
Dataset.exportFeatures() //=> { labels: { col1: { "0": false, "1": true, "N": 0, "Yes": 1, "No": 0, "f": 0, "false": 1, } } }
public filterColumn(filter: Function): Array source
returns filtered rows of data
Params:
Name | Type | Attribute | Description |
filter | Function |
|
filter function |
Return:
Array | filtered array of data |
Example:
const csvObj = new DataSet([{col1:1,col2:5},{col1:2,col2:6}]);
csvObj.filterColumn((row)=>row.col1>=2); // =>
//[
// [2,6],
//]
public fitColumns(): Object[] source
mutates data property of DataSet by replacing multiple columns in a single command
Params:
Name | Type | Attribute | Description |
options.returnData | Boolean | return updated DataSet data property |
|
options.columns | Object[] | {name:'columnName',options:{strategy:'mean',labelOoptions:{}},} |
Return:
Object[] |
Example:
//fit Columns, mutates dataset
dataset.fitColumns({
columns:[{name:'Age',options:{ strategy:'mean'} }]
});
// dataset
// class DataSet
// data:[
// {
// 'Country': 'Brazil',
// 'Age': '38.77777777777778',
// 'Salary': '72000',
// 'Purchased': 'N',
// }
// ...
// ]
public fitInverseTransforms(options: *) source
Mutate dataset data by inversing all transforms
Params:
Name | Type | Attribute | Description |
options | * |
Example:
DataSet.data;
// [{
// Country: 'Brazil',
// Age: 3.784189633918261,
// Salary: '72000',
// Purchased: 'N',
// Country_Brazil: 1,
// Country_Mexico: 0,
// Country_Ghana: 0
// },
// ...
// ]
DataSet.fitInverseTransforms(); // =>
// [{
// 'Country': 'Brazil',
// 'Age': '44',
// 'Salary': '72000',
// 'Purchased': 'N',
// },
// ...
// ]
public fitTransforms(options: *) source
Mutate dataset data with all transforms
Params:
Name | Type | Attribute | Description |
options | * |
Example:
DataSet.data;
// [{
// 'Country': 'Brazil',
// 'Age': '44',
// 'Salary': '72000',
// 'Purchased': 'N',
// },
// ...
// ]
DataSet.fitTransforms(); // =>
// [{
// Country: 'Brazil',
// Age: 3.784189633918261,
// Salary: '72000',
// Purchased: 'N',
// Country_Brazil: 1,
// Country_Mexico: 0,
// Country_Ghana: 0
// },
// ...
// ]
public importFeatures(features: {labels: Map, encoders: Map, scalers: map}) source
set encoders, labels and scalers
Params:
Name | Type | Attribute | Description |
features | {labels: Map, encoders: Map, scalers: map} |
|
JavaScript Object of transforms encoders and scalers(labels, encoders, scalers) |
Example:
const csvObj = new DataSet([{col1:1,col2:5},{col1:false,col2:6}]);
DataSet.fitColumns({col1:['label',{binary:true}]});
Dataset.data // => [{col1:true,col2:5},{col1:false,col2:6}]
Dataset.exportFeatures() //=> { labels: { col1: { "0": false, "1": true, "N": 0, "Yes": 1, "No": 0, "f": 0, "false": 1, } } }
public inverseTransformObject(data: *, options: *): Object source
Inverses transform on an object
Params:
Name | Type | Attribute | Description |
data | * | ||
options | * |
Return:
Object | returns object with inverse transformed data |
Example:
DataSet.data; //[{
// Age: 0.6387122698222066,
// Salary: 72000,
// Purchased: 0,
// Country_Brazil: 1,
// Country_Mexico: 0,
// Country_Ghana: 0,
// }, ...]
DataSet.inverseTransformObject(DataSet.data[0]); // => {
// Country: 'Brazil',
// Age: 44,
// Salary: 72000,
// Purchased: 'N',
// };
public labelDecode(name: string, options: *): array source
returns a new array and decodes an encoded column back to the original array values
Params:
Name | Type | Attribute | Description |
name | string | csv column header, or JSON object property name |
|
options | * |
Return:
array |
public labelEncoder(name: string, options: *): array source
returns a new array and label encodes a selected column
Params:
Name | Type | Attribute | Description |
name | string | csv column header, or JSON object property name |
|
options | * | ||
options.binary | boolean |
|
only replace with (0,1) with binary values |
options.sortFunction | function | custom label encoding value sort function |
Return:
array |
Example:
const oneHotCountryColumn = dataset.oneHotEncoder('Country');
// [ 'N', 'Yes', 'No', 'f', 'Yes', 'Yes', 'false', 'Yes', 'No', 'Yes' ]
const originalPurchasedColumn = dataset.labelEncoder('Purchased');
// [ 0, 1, 0, 0, 1, 1, 1, 1, 0, 1 ]
const encodedBinaryPurchasedColumn = dataset.labelEncoder('Purchased',{ binary:true });
// [ 0, 1, 2, 3, 1, 1, 4, 1, 2, 1 ]
const encodedPurchasedColumn = dataset.labelEncoder('Purchased');
See:
- http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
public oneHotColumnArray(name: string, options: *): Array<Object> source
Return one hot encoded data
Params:
Name | Type | Attribute | Description |
name | string | column name |
|
options | * |
Return:
Array<Object> | returns an array of objects from an one hot encoded column |
Example:
const csvData = [{
'Country': 'Brazil',
'Age': '44',
'Salary': '72000',
'Purchased': 'N',
},
{
'Country': 'Mexico',
'Age': '27',
'Salary': '48000',
'Purchased': 'Yes',
},
...
];
const EncodedCSVDataSet = new ms.preprocessing.DataSet(csvData);
EncodedCSVDataSet.fitColumns({
columns: [
{
name: 'Country',
options: { strategy: 'onehot', },
},
],
});
EncodedCSVDataSet.oneHotColumnArray('Country);// =>
// [ { Country_Brazil: 1, Country_Mexico: 0, Country_Ghana: 0 },
// { Country_Brazil: 0, Country_Mexico: 1, Country_Ghana: 0 },
// { Country_Brazil: 0, Country_Mexico: 0, Country_Ghana: 1 },
// ...]
public transformObject(data: *, options: *): Object source
transforms an object and replaces values that have been scaled or encoded
Params:
Name | Type | Attribute | Description |
data | * | ||
options | * |
Return:
Object |
Example:
DataSet.transformObject({
'Country': 'Brazil',
'Age': '44',
'Salary': '72000',
'Purchased': 'N',
}); // =>
// {
// Country: 'Brazil',
// Age: 3.784189633918261,
// Salary: '72000',
// Purchased: 'N',
// Country_Brazil: 1,
// Country_Mexico: 0,
// Country_Ghana: 0
// }