Home Reference Source Test
import {DataSet} from 'modelscript/src/DataSet.mjs'
public class | source

DataSet

class for manipulating an array of objects, typically from CSV data

Static Method Summary

Static Public Methods
public static

columnArray(name: string, options: *): array

returns a new array of a selected column from an array of objects, can filter, scale and replace values

public static

columnMatrix(vectors: Array, data: Array): Array

returns a matrix of values by combining column arrays into a matrix

public static

encodeObject(data: Object, options: {labels: Array<String>, prefix: String, name: String}): Object

Returns an object into an one hot encoded object

public static

getBinaryValue(value: String | Number): Number

returns 0 or 1 depending on the input value

public static

getTransforms(transforms: Object): Array<Object>

Allows for fit transform short hand notation

public static

mapToObject(mapObj: Map): Object

returns a JavaScript Object from a Map (supports nested Map Objects)

public static

oneHotDecoder(name: string, options: *): Array<Object>

Return one hot encoded data

public static

oneHotEncoder(name: string, options: *): Object

returns a new object of one hot encoded values

public static

reverseColumnMatrix(options: *): Object[]

returns an array of objects by applying labels to matrix of columns

public static
public static

selectColumns(names: String[], options: *): Object[]

returns a list of objects with only selected columns as properties

Constructor Summary

Public Constructor
public

constructor(dataset: Object[]): this

creates a new raw data instance for preprocessing data for machine learning

Member Summary

Public Members
public
public
public
public
public
public
public
public
public
public
public
public
public
public

Method Summary

Public Methods
public

columnDescale(name: string): number[]

Returns a new array of descaled values

public

columnMerge(name: String, data: Array): Object

it returns a new column that is merged onto the data set

public

columnReducer(name: String, options: Object): Object

it returns a new column that reduces a column into a new column object, this is used in data prep to create new calculated columns for aggregrate statistics

public

columnReplace(name: string, options: *): array | Object[]

returns a new array of a selected column from an array of objects and replaces empty values, encodes values and scales values

public

columnScale(name: string): number[]

Returns a new array of scaled values which can be reverse (descaled).

public

exportFeatures(filter: Function): {labels: Map, encoders: Map, scalers: map}

returns Object of all encoders and scalers

public

filterColumn(filter: Function): Array

returns filtered rows of data

public

fitColumns(): Object[]

mutates data property of DataSet by replacing multiple columns in a single command

public

fitInverseTransforms(options: *)

Mutate dataset data by inversing all transforms

public

fitTransforms(options: *)

Mutate dataset data with all transforms

public

importFeatures(features: {labels: Map, encoders: Map, scalers: map})

set encoders, labels and scalers

public

inverseTransformObject(data: *, options: *): Object

Inverses transform on an object

public

labelDecode(name: string, options: *): array

returns a new array and decodes an encoded column back to the original array values

public

labelEncoder(name: string, options: *): array

returns a new array and label encodes a selected column

public

oneHotColumnArray(name: string, options: *): Array<Object>

Return one hot encoded data

public

transformObject(data: *, options: *): Object

transforms an object and replaces values that have been scaled or encoded

Static Public Methods

public static columnArray(name: string, options: *): array source

returns a new array of a selected column from an array of objects, can filter, scale and replace values

Params:

NameTypeAttributeDescription
name string

csv column header, or JSON object property name

options *
options.prefilter function
  • optional

prefilter values to return

options.filter function
  • optional

filter values to return

options.replace.test function
  • optional
  • default: undefined

test function for replacing values (arr[val])

options.replace.value string | number | function
  • optional
  • default: undefined

value to replace (arr[val]) if replace test is true, if a function (result,val,index,arr,name)=>your custom value

options.parseIntBase number
  • optional
  • default: 10

radix value for parseInt

options.parseFloat boolean
  • optional
  • default: false

convert values to floats

options.parseInt boolean
  • optional
  • default: false

converts values to ints

options.scale boolean
  • optional
  • default: false

standard or minmax feature scale values

Return:

array

Example:

//column Array returns column of data by name
// [ '44','27','30','38','40','35','','48','50', '37' ]
const OringalAgeColumn = dataset.columnArray('Age'); 

public static columnMatrix(vectors: Array, data: Array): Array source

returns a matrix of values by combining column arrays into a matrix

Params:

NameTypeAttributeDescription
vectors Array
  • optional
  • default: []

array of arguments for columnArray to merge columns into a matrix

data Array
  • optional
  • default: []

array of data to convert to matrix

Return:

Array

a matrix of column values

Example:

const csvObj = new DataSet([{col1:1,col2:5},{col1:2,col2:6}]);
csvObj.columnMatrix([['col1',{parseInt:true}],['col2']]); // =>
//[ 
//  [1,5], 
//  [2,6], 
//]

public static encodeObject(data: Object, options: {labels: Array<String>, prefix: String, name: String}): Object source

Returns an object into an one hot encoded object

Params:

NameTypeAttributeDescription
data Object

object to encode

options {labels: Array<String>, prefix: String, name: String}

encoded object options

Return:

Object

one hot encoded object

Example:

const labels = ['apple', 'orange', 'banana',];
const prefix = 'fruit_';
const name = 'fruit';
const options = { labels, prefix, name, };
const data = {
fruit: 'apple',
};
EncodedCSVDataSet.encodeObject(data, options); // => { fruit_apple: 1, fruit_orange: 0, fruit_banana: 0, }

public static getBinaryValue(value: String | Number): Number source

returns 0 or 1 depending on the input value

Params:

NameTypeAttributeDescription
value String | Number
  • optional
  • default: ''

value to convert to a 1 or a 0

Return:

Number

0 or 1 depending on truthiness of value

Example:

DataSet.getBinaryValue('true') // => 1
DataSet.getBinaryValue('false') // => 0
DataSet.getBinaryValue('No') // => 0
DataSet.getBinaryValue(false) // => 0

public static getTransforms(transforms: Object): Array<Object> source

Allows for fit transform short hand notation

Params:

NameTypeAttributeDescription
transforms Object

Return:

Array<Object>

returns fit columns, columns property

Example:

DataSet.getTransforms({
Age: ['scale',],
Rating: ['label',],  }); //=> [
//   {
//    name: 'Age', options: { strategy: 'scale', }, },
//   },
//   { 
//    name: 'Rating', options: { strategy: 'label', }, 
//   },
// ];

public static mapToObject(mapObj: Map): Object source

returns a JavaScript Object from a Map (supports nested Map Objects)

Params:

NameTypeAttributeDescription
mapObj Map

Map to convert into JavaScript Object

Return:

Object

JavaScript Object converted from a Map

Example:

const csvObj = new DataSet([{col1:1,col2:5},{col1:2,col2:6}]);
csvObj.columnMatrix([['col1',{parseInt:true}],['col2']]); // =>
//[ 
//  [1,5], 
//  [2,6], 
//]

public static oneHotDecoder(name: string, options: *): Array<Object> source

Return one hot encoded data

Params:

NameTypeAttributeDescription
name string

column name

options *

Return:

Array<Object>

returns an array of objects from an one hot encoded column

Example:

const csvData = [{
'Country': 'Brazil',
'Age': '44',
'Salary': '72000',
'Purchased': 'N',
},
{
'Country': 'Mexico',
'Age': '27',
'Salary': '48000',
'Purchased': 'Yes',
},
...
];
const EncodedCSVDataSet = new ms.preprocessing.DataSet(csvData);
EncodedCSVDataSet.fitColumns({
columns: [
{
name: 'Country',
options: { strategy: 'onehot', },
},
],
});

EncodedCSVDataSet.oneHotDecoder('Country);// =>
// [ { Country: 'Brazil' },
//  { Country: 'Mexico' },
//  { Country: 'Ghana' },
//  { Country: 'Mexico' },
//   ...]

public static oneHotEncoder(name: string, options: *): Object source

returns a new object of one hot encoded values

Params:

NameTypeAttributeDescription
name string

csv column header, or JSON object property name

options *

Return:

Object

Example:

// [ 'Brazil','Mexico','Ghana','Mexico','Ghana','Brazil','Mexico','Brazil','Ghana', 'Brazil' ]
const originalCountry = dataset.columnArray('Country'); 

// { originalCountry:
//    { Country_Brazil: [ 1, 0, 0, 0, 0, 1, 0, 1, 0, 1 ],
//      Country_Mexico: [ 0, 1, 0, 1, 0, 0, 1, 0, 0, 0 ],
//      Country_Ghana: [ 0, 0, 1, 0, 1, 0, 0, 0, 1, 0 ] },
//     }
const oneHotCountryColumn = dataset.oneHotEncoder('Country'); 

See:

  • http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

public static reverseColumnMatrix(options: *): Object[] source

returns an array of objects by applying labels to matrix of columns

Params:

NameTypeAttributeDescription
options *
options.vectors Array[]

array of vectors

options.labels String[]

array of labels

Return:

Object[]

an array of objects with properties derived from options.labels

Example:

const data = [{ Age: '44', Salary: '44' },
{ Age: '27', Salary: '27' }]
const AgeDataSet = new MS.DataSet(data);
const dependentVariables = [ [ 'Age', ], [ 'Salary', ], ];
const AgeSalMatrix = AgeDataSet.columnMatrix(dependentVariables); // =>
//  [ [ '44', '72000' ],
//  [ '27', '48000' ] ];
MS.DataSet.reverseColumnMatrix({vectors:AgeSalMatrix,labels:dependentVariables}); // => [{ Age: '44', Salary: '44' },
{ Age: '27', Salary: '27' }]

public static reverseColumnVector() source

public static selectColumns(names: String[], options: *): Object[] source

returns a list of objects with only selected columns as properties

Params:

NameTypeAttributeDescription
names String[]

array of selected columns

options *

Return:

Object[]

an array of objects with properties derived from names

Example:

const data = [{ Age: '44', Salary: '44' , Height: '34' },
{ Age: '27', Salary: '44' , Height: '50'  }]
const AgeDataSet = new MS.DataSet(data);
const cols = [ 'Age', 'Salary' ];
const selectedCols = CSVDataSet.selectColumns(cols); // => [{ Age: '44', Salary: '44' },
{ Age: '27', Salary: '27' }]

Public Constructors

public constructor(dataset: Object[]): this source

creates a new raw data instance for preprocessing data for machine learning

Params:

NameTypeAttributeDescription
dataset Object[]

Return:

this

Example:

const dataset = new ms.DataSet(csvData);

Public Members

public columnArray source

public columnMatrix source

public config source

public data source

public encodeObject source

public encoders source

public getTransforms source

public labels source

public oneHotDecoder source

public oneHotEncoder source

public reverseColumnMatrix source

public reverseColumnVector source

public scalers source

public selectColumns source

Public Methods

public columnDescale(name: string): number[] source

Returns a new array of descaled values

Params:

NameTypeAttributeDescription
name string

name - csv column header, or JSON object property name

options.strategy string
  • optional
  • default: "log"

strategy for scaling values

Return:

number[]

returns an array of scaled values

Example:

//dataset.columnArray('Age') => [ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]
const scaledData = [ 3.784189633918261,
3.295836866004329, 3.4011973816621555, 3.6375861597263857, 3.6888794541139363, 3.5553480614894135, 3.657847344866208, 3.8712010109078907, 3.912023005428146, 3.6109179126442243 ]
dataset.columnDescale('Age') // => [ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]

public columnMerge(name: String, data: Array): Object source

it returns a new column that is merged onto the data set

Params:

NameTypeAttributeDescription
name String

name of new Column

data Array

new dataset data

Return:

Object

Example:

CSVDataSet.columnMerge('DoubleAge', [ 88, 54, 60, 76, 80, 70, 0, 96, 100, 74 ]); //=> { DoubleAge: [ 88, 54, 60, 76, 80, 70, 0, 96, 100, 74 ] }

public columnReducer(name: String, options: Object): Object source

it returns a new column that reduces a column into a new column object, this is used in data prep to create new calculated columns for aggregrate statistics

Params:

NameTypeAttributeDescription
name String

name of new Column

options Object
options.columnName String

name property for columnArray selection

options.columnOptions Object

options property for columnArray

options.reducer Function

reducer function to reduce into new array, it should push values into the resulting array

Return:

Object

a new object that has reduced array as the value

Example:

const reducer = (result, value, index, arr) => {
result.push(value * 2);
return result;
};
CSVDataSet.columnReducer('DoubleAge', {
columnName: 'Age',
reducer,
}); //=> { DoubleAge: [ 88, 54, 60, 76, 80, 70, 0, 96, 100, 74 ] }

public columnReplace(name: string, options: *): array | Object[] source

returns a new array of a selected column from an array of objects and replaces empty values, encodes values and scales values

Params:

NameTypeAttributeDescription
name string

csv column header, or JSON object property name

options *
options.empty boolean
  • optional
  • default: true

replace empty values

options.strategy boolean
  • optional
  • default: "mean"

strategy for replacing value, any array stat method from ml.js (mean, standardDeviation, median) or (label,labelEncoder,onehot,oneHotEncoder)

Return:

array | Object[]

Example:

//column Replace returns new Array with replaced missing data
//[ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]
const ReplacedAgeMeanColumn = dataset.columnReplace('Age',{strategy:'mean'});

public columnScale(name: string): number[] source

Returns a new array of scaled values which can be reverse (descaled). The scaling transformations are stored on the DataSet

Params:

NameTypeAttributeDescription
name string

name - csv column header, or JSON object property name

options.strategy string
  • optional
  • default: "log"

strategy for scaling values

Return:

number[]

returns an array of scaled values

Example:

//dataset.columnArray('Age') => [ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]
dataset.columnScale('Age',{strategy:'log'}) // => [ 3.784189633918261,
3.295836866004329, 3.4011973816621555, 3.6375861597263857, 3.6888794541139363, 3.5553480614894135, 3.657847344866208, 3.8712010109078907, 3.912023005428146, 3.6109179126442243 ]
dataset.scalers.get('Age').scale(45) // => 3.8066624897703196
dataset.scalers.get('Age').descale(3.8066624897703196) // => 45
//this supports, log/exponent, minmax/normalization and standardscaling

public exportFeatures(filter: Function): {labels: Map, encoders: Map, scalers: map} source

returns Object of all encoders and scalers

Params:

NameTypeAttributeDescription
filter Function
  • optional

filter function

Return:

{labels: Map, encoders: Map, scalers: map}

JavaScript Object of transforms encoders and scalers(labels, encoders, scalers)

Example:

const csvObj = new DataSet([{col1:1,col2:5},{col1:false,col2:6}]);
DataSet.fitColumns({col1:['label',{binary:true}]}); 
Dataset.data // => [{col1:true,col2:5},{col1:false,col2:6}]
Dataset.exportFeatures() //=> { labels: { col1: { "0": false, "1": true, "N": 0, "Yes": 1, "No": 0, "f": 0, "false": 1, } } }

public filterColumn(filter: Function): Array source

returns filtered rows of data

Params:

NameTypeAttributeDescription
filter Function
  • optional

filter function

Return:

Array

filtered array of data

Example:

const csvObj = new DataSet([{col1:1,col2:5},{col1:2,col2:6}]);
csvObj.filterColumn((row)=>row.col1>=2); // =>
//[ 
//  [2,6], 
//]

public fitColumns(): Object[] source

mutates data property of DataSet by replacing multiple columns in a single command

Params:

NameTypeAttributeDescription
options.returnData Boolean

return updated DataSet data property

options.columns Object[]

{name:'columnName',options:{strategy:'mean',labelOoptions:{}},}

Return:

Object[]

Example:

//fit Columns, mutates dataset
dataset.fitColumns({
columns:[{name:'Age',options:{ strategy:'mean'} }]
});
// dataset
// class DataSet
//   data:[
//     {
//       'Country': 'Brazil',
//       'Age': '38.77777777777778',
//       'Salary': '72000',
//       'Purchased': 'N',
//     }
//     ...
//   ]

public fitInverseTransforms(options: *) source

Mutate dataset data by inversing all transforms

Params:

NameTypeAttributeDescription
options *

Example:

DataSet.data;
// [{ 
//  Country: 'Brazil',
//  Age: 3.784189633918261,
//  Salary: '72000',
//  Purchased: 'N',
//  Country_Brazil: 1,
//  Country_Mexico: 0,
//  Country_Ghana: 0
// },
// ...
// ]
DataSet.fitInverseTransforms(); // =>
// [{
//   'Country': 'Brazil',
//   'Age': '44',
//   'Salary': '72000',
//   'Purchased': 'N',
// },
// ...
// ]

public fitTransforms(options: *) source

Mutate dataset data with all transforms

Params:

NameTypeAttributeDescription
options *

Example:

DataSet.data;
// [{
//   'Country': 'Brazil',
//   'Age': '44',
//   'Salary': '72000',
//   'Purchased': 'N',
// },
// ...
// ]
DataSet.fitTransforms(); // =>
// [{ 
//  Country: 'Brazil',
//  Age: 3.784189633918261,
//  Salary: '72000',
//  Purchased: 'N',
//  Country_Brazil: 1,
//  Country_Mexico: 0,
//  Country_Ghana: 0
// },
// ...
// ] 

public importFeatures(features: {labels: Map, encoders: Map, scalers: map}) source

set encoders, labels and scalers

Params:

NameTypeAttributeDescription
features {labels: Map, encoders: Map, scalers: map}
  • optional
  • default: {}

JavaScript Object of transforms encoders and scalers(labels, encoders, scalers)

Example:

const csvObj = new DataSet([{col1:1,col2:5},{col1:false,col2:6}]);
DataSet.fitColumns({col1:['label',{binary:true}]}); 
Dataset.data // => [{col1:true,col2:5},{col1:false,col2:6}]
Dataset.exportFeatures() //=> { labels: { col1: { "0": false, "1": true, "N": 0, "Yes": 1, "No": 0, "f": 0, "false": 1, } } }

public inverseTransformObject(data: *, options: *): Object source

Inverses transform on an object

Params:

NameTypeAttributeDescription
data *
options *

Return:

Object

returns object with inverse transformed data

Example:

DataSet.data; //[{
//   Age: 0.6387122698222066,
//   Salary: 72000,
//   Purchased: 0,
//   Country_Brazil: 1,
//   Country_Mexico: 0,
//   Country_Ghana: 0,
// }, ...] 
DataSet.inverseTransformObject(DataSet.data[0]); // => {
//  Country: 'Brazil', 
//  Age: 44, 
//  Salary: 72000, 
//  Purchased: 'N', 
// };

public labelDecode(name: string, options: *): array source

returns a new array and decodes an encoded column back to the original array values

Params:

NameTypeAttributeDescription
name string

csv column header, or JSON object property name

options *

Return:

array

public labelEncoder(name: string, options: *): array source

returns a new array and label encodes a selected column

Params:

NameTypeAttributeDescription
name string

csv column header, or JSON object property name

options *
options.binary boolean
  • optional
  • default: false

only replace with (0,1) with binary values

options.sortFunction function

custom label encoding value sort function

Return:

array

Example:

const oneHotCountryColumn = dataset.oneHotEncoder('Country'); 

// [ 'N', 'Yes', 'No', 'f', 'Yes', 'Yes', 'false', 'Yes', 'No', 'Yes' ] 
const originalPurchasedColumn = dataset.labelEncoder('Purchased');
// [ 0, 1, 0, 0, 1, 1, 1, 1, 0, 1 ]
const encodedBinaryPurchasedColumn = dataset.labelEncoder('Purchased',{ binary:true });
// [ 0, 1, 2, 3, 1, 1, 4, 1, 2, 1 ]
const encodedPurchasedColumn = dataset.labelEncoder('Purchased'); 

See:

  • http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

public oneHotColumnArray(name: string, options: *): Array<Object> source

Return one hot encoded data

Params:

NameTypeAttributeDescription
name string

column name

options *

Return:

Array<Object>

returns an array of objects from an one hot encoded column

Example:

const csvData = [{
'Country': 'Brazil',
'Age': '44',
'Salary': '72000',
'Purchased': 'N',
},
{
'Country': 'Mexico',
'Age': '27',
'Salary': '48000',
'Purchased': 'Yes',
},
...
];
const EncodedCSVDataSet = new ms.preprocessing.DataSet(csvData);
EncodedCSVDataSet.fitColumns({
columns: [
{
name: 'Country',
options: { strategy: 'onehot', },
},
],
});

EncodedCSVDataSet.oneHotColumnArray('Country);// =>
// [ { Country_Brazil: 1, Country_Mexico: 0, Country_Ghana: 0 },
//   { Country_Brazil: 0, Country_Mexico: 1, Country_Ghana: 0 },
//   { Country_Brazil: 0, Country_Mexico: 0, Country_Ghana: 1 },
//   ...]

public transformObject(data: *, options: *): Object source

transforms an object and replaces values that have been scaled or encoded

Params:

NameTypeAttributeDescription
data *
options *

Return:

Object

Example:

DataSet.transformObject({
'Country': 'Brazil',
'Age': '44',
'Salary': '72000',
'Purchased': 'N',
}); // =>
// { 
//  Country: 'Brazil',
//  Age: 3.784189633918261,
//  Salary: '72000',
//  Purchased: 'N',
//  Country_Brazil: 1,
//  Country_Mexico: 0,
//  Country_Ghana: 0
// }