Options
All
  • Public
  • Public/Protected
  • All
Menu

Class DataSet

class for manipulating an array of objects, typically from CSV data

memberof

preprocessing

Hierarchy

  • DataSet

Index

Constructors

constructor

  • new DataSet(data?: Data, options?: {}): DataSet
  • creates a new raw data instance for preprocessing data for machine learning

    example

    const dataset = new ms.DataSet(csvData);

    Parameters

    • data: Data = ...
    • options: {} = ...

    Returns DataSet

Properties

columnArray

columnArray: (...args: any[]) => any

Type declaration

    • (...args: any[]): any
    • Parameters

      • Rest ...args: any[]

      Returns any

columnMatrix

columnMatrix: (...args: any[]) => any

Type declaration

    • (...args: any[]): any
    • Parameters

      • Rest ...args: any[]

      Returns any

config

config: {}

Type declaration

  • [index: string]: any

data

data: Data

encodeObject

encodeObject: (...args: any[]) => any

Type declaration

    • (...args: any[]): any
    • Parameters

      • Rest ...args: any[]

      Returns any

encoders

encoders: any

getTransforms

getTransforms: (...args: any[]) => any

Type declaration

    • (...args: any[]): any
    • Parameters

      • Rest ...args: any[]

      Returns any

labels

labels: any

oneHotDecoder

oneHotDecoder: (...args: any[]) => any

Type declaration

    • (...args: any[]): any
    • Parameters

      • Rest ...args: any[]

      Returns any

oneHotEncoder

oneHotEncoder: (...args: any[]) => any

Type declaration

    • (...args: any[]): any
    • Parameters

      • Rest ...args: any[]

      Returns any

reverseColumnMatrix

reverseColumnMatrix: (...args: any[]) => any

Type declaration

    • (...args: any[]): any
    • Parameters

      • Rest ...args: any[]

      Returns any

reverseColumnVector

reverseColumnVector: (...args: any[]) => any

Type declaration

    • (...args: any[]): any
    • Parameters

      • Rest ...args: any[]

      Returns any

scalers

scalers: any

selectColumns

selectColumns: (...args: any[]) => any

Type declaration

    • (...args: any[]): any
    • Parameters

      • Rest ...args: any[]

      Returns any

Static data

data: any

Static encoders

encoders: any

Methods

columnDescale

  • columnDescale(name: any, options: any): any
  • Returns a new array of descaled values

    example

    //dataset.columnArray('Age') => [ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ] const scaledData = [ 3.784189633918261, 3.295836866004329, 3.4011973816621555, 3.6375861597263857, 3.6888794541139363, 3.5553480614894135, 3.657847344866208, 3.8712010109078907, 3.912023005428146, 3.6109179126442243 ] dataset.columnDescale('Age') // => [ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]

    Parameters

    • name: any

      name - csv column header, or JSON object property name

    • options: any

    Returns any

    returns an array of scaled values

columnMerge

  • columnMerge(name: any, data?: any[]): {}
  • it returns a new column that is merged onto the data set

    example

    CSVDataSet.columnMerge('DoubleAge', [ 88, 54, 60, 76, 80, 70, 0, 96, 100, 74 ]); //=> { DoubleAge: [ 88, 54, 60, 76, 80, 70, 0, 96, 100, 74 ] }

    Parameters

    • name: any

      name of new Column

    • data: any[] = ...

      new dataset data

    Returns {}

columnReducer

  • columnReducer(name: any, options: { columnName: any; columnOptions: any; reducer: any }): {}
  • it returns a new column that reduces a column into a new column object, this is used in data prep to create new calculated columns for aggregrate statistics

    example

    const reducer = (result, value, index, arr) => { result.push(value * 2); return result; }; CSVDataSet.columnReducer('DoubleAge', { columnName: 'Age', reducer, }); //=> { DoubleAge: [ 88, 54, 60, 76, 80, 70, 0, 96, 100, 74 ] }

    Parameters

    • name: any

      name of new Column

    • options: { columnName: any; columnOptions: any; reducer: any }
      • columnName: any

        name property for columnArray selection

      • columnOptions: any

        options property for columnArray

      • reducer: any

        reducer function to reduce into new array, it should push values into the resulting array

    Returns {}

    a new object that has reduced array as the value

columnReplace

  • columnReplace(name: any, options?: any): any
  • returns a new array of a selected column from an array of objects and replaces empty values, encodes values and scales values

    example

    //column Replace returns new Array with replaced missing data //[ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ] const ReplacedAgeMeanColumn = dataset.columnReplace('Age',{strategy:'mean'});

    Parameters

    • name: any

      csv column header, or JSON object property name

    • options: any = ...

    Returns any

columnScale

  • columnScale(name: any, options?: {}): any
  • Returns a new array of scaled values which can be reverse (descaled). The scaling transformations are stored on the DataSet

    example

    //dataset.columnArray('Age') => [ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ] dataset.columnScale('Age',{strategy:'log'}) // => [ 3.784189633918261, 3.295836866004329, 3.4011973816621555, 3.6375861597263857, 3.6888794541139363, 3.5553480614894135, 3.657847344866208, 3.8712010109078907, 3.912023005428146, 3.6109179126442243 ] dataset.scalers.get('Age').scale(45) // => 3.8066624897703196 dataset.scalers.get('Age').descale(3.8066624897703196) // => 45 //this supports, log/exponent, minmax/normalization and standardscaling

    Parameters

    • name: any

      name - csv column header, or JSON object property name

    • options: {} = ...

    Returns any

    returns an array of scaled values

exportFeatures

  • exportFeatures(options?: {}): { encoders: any; labels: any; scalers: any }
  • returns Object of all encoders and scalers

    example

    const csvObj = new DataSet([{col1:1,col2:5},{col1:false,col2:6}]); DataSet.fitColumns({col1:['label',{binary:true}]}); Dataset.data // => [{col1:true,col2:5},{col1:false,col2:6}] Dataset.exportFeatures() //=> { labels: { col1: { "0": false, "1": true, "N": 0, "Yes": 1, "No": 0, "f": 0, "false": 1, } } }

    Parameters

    • options: {} = ...

    Returns { encoders: any; labels: any; scalers: any }

    JavaScript Object of transforms encoders and scalers(labels, encoders, scalers)

    • encoders: any
    • labels: any
    • scalers: any

filterColumn

  • filterColumn(filter?: () => boolean): Datum[]
  • returns filtered rows of data

    example

    const csvObj = new DataSet([{col1:1,col2:5},{col1:2,col2:6}]); csvObj.filterColumn((row)=>row.col1>=2); // => //[ // [2,6], //]

    Parameters

    • filter: () => boolean = ...
        • (): boolean
        • Returns boolean

    Returns Datum[]

    filtered array of data

fitColumns

  • fitColumns(options?: any, mockDataOptions?: {}): Data | DataSet
  • mutates data property of DataSet by replacing multiple columns in a single command

    example

    //fit Columns, mutates dataset dataset.fitColumns({ columns:[{name:'Age',options:{ strategy:'mean'} }] }); // dataset // class DataSet // data:[ // { // 'Country': 'Brazil', // 'Age': '38.77777777777778', // 'Salary': '72000', // 'Purchased': 'N', // } // ... // ]

    Parameters

    • options: any = ...
    • mockDataOptions: {} = ...

    Returns Data | DataSet

fitInverseTransforms

  • fitInverseTransforms(options?: any): Data | DataSet
  • Mutate dataset data by inversing all transforms

    example

    DataSet.data; // [{ // Country: 'Brazil', // Age: 3.784189633918261, // Salary: '72000', // Purchased: 'N', // Country_Brazil: 1, // Country_Mexico: 0, // Country_Ghana: 0 // }, // ... // ] DataSet.fitInverseTransforms(); // => // [{ // 'Country': 'Brazil', // 'Age': '44', // 'Salary': '72000', // 'Purchased': 'N', // }, // ... // ]

    Parameters

    • options: any = ...

    Returns Data | DataSet

fitTransforms

  • fitTransforms(options?: any): Data | DataSet
  • Mutate dataset data with all transforms

    example

    DataSet.data; // [{ // 'Country': 'Brazil', // 'Age': '44', // 'Salary': '72000', // 'Purchased': 'N', // }, // ... // ] DataSet.fitTransforms(); // => // [{ // Country: 'Brazil', // Age: 3.784189633918261, // Salary: '72000', // Purchased: 'N', // Country_Brazil: 1, // Country_Mexico: 0, // Country_Ghana: 0 // }, // ... // ]

    Parameters

    • options: any = ...

    Returns Data | DataSet

importFeatures

  • importFeatures(features?: any): void
  • set encoders, labels and scalers

    example

    const csvObj = new DataSet([{col1:1,col2:5},{col1:false,col2:6}]); DataSet.fitColumns({col1:['label',{binary:true}]}); Dataset.data // => [{col1:true,col2:5},{col1:false,col2:6}] Dataset.exportFeatures() //=> { labels: { col1: { "0": false, "1": true, "N": 0, "Yes": 1, "No": 0, "f": 0, "false": 1, } } }

    Parameters

    • features: any = ...

    Returns void

inverseTransformObject

  • inverseTransformObject(data: {}, options: {}): {}
  • Inverses transform on an object

    example

    DataSet.data; //[{ // Age: 0.6387122698222066, // Salary: 72000, // Purchased: 0, // Country_Brazil: 1, // Country_Mexico: 0, // Country_Ghana: 0, // }, ...] DataSet.inverseTransformObject(DataSet.data[0]); // => { // Country: 'Brazil', // Age: 44, // Salary: 72000, // Purchased: 'N', // };

    Parameters

    • data: {}
      • [x: string]: any
    • options: {}

    Returns {}

    returns object with inverse transformed data

    • [x: string]: any

labelDecode

  • labelDecode(name: any, options?: any): any
  • returns a new array and decodes an encoded column back to the original array values

    Parameters

    • name: any

      csv column header, or JSON object property name

    • options: any = ...

    Returns any

labelEncoder

  • labelEncoder(name: any, options: {}): any
  • returns a new array and label encodes a selected column

    example

    const oneHotCountryColumn = dataset.oneHotEncoder('Country');

    // [ 'N', 'Yes', 'No', 'f', 'Yes', 'Yes', 'false', 'Yes', 'No', 'Yes' ] const originalPurchasedColumn = dataset.labelEncoder('Purchased'); // [ 0, 1, 0, 0, 1, 1, 1, 1, 0, 1 ] const encodedBinaryPurchasedColumn = dataset.labelEncoder('Purchased',{ binary:true }); // [ 0, 1, 2, 3, 1, 1, 4, 1, 2, 1 ] const encodedPurchasedColumn = dataset.labelEncoder('Purchased');

    see

    http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html

    Parameters

    • name: any

      csv column header, or JSON object property name

    • options: {}

    Returns any

oneHotColumnArray

  • oneHotColumnArray(name: any, options: any): any
  • Return one hot encoded data

    example

    const csvData = [{ 'Country': 'Brazil', 'Age': '44', 'Salary': '72000', 'Purchased': 'N', }, { 'Country': 'Mexico', 'Age': '27', 'Salary': '48000', 'Purchased': 'Yes', }, ... ]; const EncodedCSVDataSet = new ms.preprocessing.DataSet(csvData); EncodedCSVDataSet.fitColumns({ columns: [ { name: 'Country', options: { strategy: 'onehot', }, }, ], });

    EncodedCSVDataSet.oneHotColumnArray('Country);// => // [ { Country_Brazil: 1, Country_Mexico: 0, Country_Ghana: 0 }, // { Country_Brazil: 0, Country_Mexico: 1, Country_Ghana: 0 }, // { Country_Brazil: 0, Country_Mexico: 0, Country_Ghana: 1 }, // ...]

    Parameters

    • name: any

      column name

    • options: any

    Returns any

    returns an array of objects from an one hot encoded column

transformObject

  • transformObject(data: {}, options: {}): {}
  • transforms an object and replaces values that have been scaled or encoded

    example

    DataSet.transformObject({ 'Country': 'Brazil', 'Age': '44', 'Salary': '72000', 'Purchased': 'N', }); // => // { // Country: 'Brazil', // Age: 3.784189633918261, // Salary: '72000', // Purchased: 'N', // Country_Brazil: 1, // Country_Mexico: 0, // Country_Ghana: 0 // }

    Parameters

    • data: {}
      • [x: string]: any
    • options: {}

    Returns {}

    • [x: string]: any

Static columnArray

  • columnArray(name: string | number, options?: any): any
  • returns a new array of a selected column from an array of objects, can filter, scale and replace values

    example

    //column Array returns column of data by name // [ '44','27','30','38','40','35','','48','50', '37' ] const OringalAgeColumn = dataset.columnArray('Age');

    Parameters

    • name: string | number

      csv column header, or JSON object property name

    • options: any = ...

    Returns any

Static columnMatrix

  • columnMatrix(vectors?: any[], data?: any[]): Matrix
  • returns a matrix of values by combining column arrays into a matrix

    example

    const csvObj = new DataSet([{col1:1,col2:5},{col1:2,col2:6}]); csvObj.columnMatrix([['col1',{parseInt:true}],['col2']]); // => //[ // [1,5], // [2,6], //]

    Parameters

    • vectors: any[] = ...
    • data: any[] = ...

    Returns Matrix

    a matrix of column values

Static encodeObject

  • encodeObject(data: Datum, options: { labels: string[]; name: string; prefix: string }): Datum
  • Returns an object into an one hot encoded object

    example

    const labels = ['apple', 'orange', 'banana',]; const prefix = 'fruit_'; const name = 'fruit'; const options = { labels, prefix, name, }; const data = { fruit: 'apple', }; EncodedCSVDataSet.encodeObject(data, options); // => { fruit_apple: 1, fruit_orange: 0, fruit_banana: 0, }

    Parameters

    • data: Datum

      object to encode

    • options: { labels: string[]; name: string; prefix: string }

      encoded object options

      • labels: string[]
      • name: string
      • prefix: string

    Returns Datum

    one hot encoded object

Static getBinaryValue

  • getBinaryValue(value?: string | boolean): 1 | 0
  • returns 0 or 1 depending on the input value

    example

    DataSet.getBinaryValue('true') // => 1 DataSet.getBinaryValue('false') // => 0 DataSet.getBinaryValue('No') // => 0 DataSet.getBinaryValue(false) // => 0

    Parameters

    • value: string | boolean = ''

    Returns 1 | 0

    0 or 1 depending on truthiness of value

Static getTransforms

  • getTransforms(transforms?: DataSetTransform): FitColumnsOptions
  • Allows for fit transform short hand notation

    example

    DataSet.getTransforms({ Age: ['scale',], Rating: ['label',], }); //=> [ // { // name: 'Age', options: { strategy: 'scale', }, }, // }, // { // name: 'Rating', options: { strategy: 'label', }, // }, // ];

    Parameters

    • transforms: DataSetTransform = ...

    Returns FitColumnsOptions

    returns fit columns, columns property

Static mapToObject

  • mapToObject(mapObj?: Map<any, any>): any
  • returns a JavaScript Object from a Map (supports nested Map Objects)

    example

    const csvObj = new DataSet([{col1:1,col2:5},{col1:2,col2:6}]); csvObj.columnMatrix([['col1',{parseInt:true}],['col2']]); // => //[ // [1,5], // [2,6], //]

    Parameters

    • mapObj: Map<any, any> = ...

      Map to convert into JavaScript Object

    Returns any

    JavaScript Object converted from a Map

Static oneHotColumnArray

  • oneHotColumnArray(name: any, oneHotColumnArrayOptions: any): any
  • Parameters

    • name: any
    • oneHotColumnArrayOptions: any

    Returns any

Static oneHotDecoder

  • oneHotDecoder(name: any, options: any): any
  • Return one hot encoded data

    example

    const csvData = [{ 'Country': 'Brazil', 'Age': '44', 'Salary': '72000', 'Purchased': 'N', }, { 'Country': 'Mexico', 'Age': '27', 'Salary': '48000', 'Purchased': 'Yes', }, ... ]; const EncodedCSVDataSet = new ms.preprocessing.DataSet(csvData); EncodedCSVDataSet.fitColumns({ columns: [ { name: 'Country', options: { strategy: 'onehot', }, }, ], });

    EncodedCSVDataSet.oneHotDecoder('Country);// => // [ { Country: 'Brazil' }, // { Country: 'Mexico' }, // { Country: 'Ghana' }, // { Country: 'Mexico' }, // ...]

    Parameters

    • name: any

      column name

    • options: any

    Returns any

    returns an array of objects from an one hot encoded column

Static oneHotEncoder

  • oneHotEncoder(name: string, options: OneHotEncoderOptions): OneHotEncodedData
  • returns a new object of one hot encoded values

    example

    // [ 'Brazil','Mexico','Ghana','Mexico','Ghana','Brazil','Mexico','Brazil','Ghana', 'Brazil' ] const originalCountry = dataset.columnArray('Country');

    // { originalCountry: // { Country_Brazil: [ 1, 0, 0, 0, 0, 1, 0, 1, 0, 1 ], // Country_Mexico: [ 0, 1, 0, 1, 0, 0, 1, 0, 0, 0 ], // Country_Ghana: [ 0, 0, 1, 0, 1, 0, 0, 0, 1, 0 ] }, // } const oneHotCountryColumn = dataset.oneHotEncoder('Country');

    see

    http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

    Parameters

    • name: string

      csv column header, or JSON object property name

    • options: OneHotEncoderOptions

    Returns OneHotEncodedData

Static reverseColumnMatrix

  • reverseColumnMatrix(options?: ReverseColumnMatrixOptions): Data
  • returns an array of objects by applying labels to matrix of columns

    example

    const data = [{ Age: '44', Salary: '44' }, { Age: '27', Salary: '27' }] const AgeDataSet = new MS.DataSet(data); const dependentVariables = [ [ 'Age', ], [ 'Salary', ], ]; const AgeSalMatrix = AgeDataSet.columnMatrix(dependentVariables); // => // [ [ '44', '72000' ], // [ '27', '48000' ] ]; MS.DataSet.reverseColumnMatrix({vectors:AgeSalMatrix,labels:dependentVariables}); // => [{ Age: '44', Salary: '44' }, { Age: '27', Salary: '27' }]

    Parameters

    • options: ReverseColumnMatrixOptions = ...

    Returns Data

    an array of objects with properties derived from options.labels

Static reverseColumnVector

  • reverseColumnVector(options?: ReverseColumnVectorOptions): Data
  • Parameters

    • options: ReverseColumnVectorOptions = ...

    Returns Data

Static selectColumns

  • selectColumns(names: any[], options?: any): any
  • returns a list of objects with only selected columns as properties

    example

    const data = [{ Age: '44', Salary: '44' , Height: '34' }, { Age: '27', Salary: '44' , Height: '50' }] const AgeDataSet = new MS.DataSet(data); const cols = [ 'Age', 'Salary' ]; const selectedCols = CSVDataSet.selectColumns(cols); // => [{ Age: '44', Salary: '44' }, { Age: '27', Salary: '27' }]

    Parameters

    • names: any[]

      array of selected columns

    • options: any = ...

    Returns any

    an array of objects with properties derived from names