JSONM Manual


Data fetching

JSONM is designed to let you describe how to load model training data into your TensorFlow model via JSON and uses @jsonstack/data to construct training data for your model. The training data for your model can come from one or many data sources. JSONM also exposes the getDataSet asynchronous function that can be used to retrieve data independently.

  1. Static Training Data - defining training data directly on your JML JSON Object on the dataset property
  2. Using JDS JSON Data - getModel’s dataset property can also be defined as a JDS JSON Object that returns training data from calling getDataSet
    1. JDS - JSON DataSet JSON spec
    2. JSON Training Data - dataset.data or dataset._static_data
    3. Dynamic Training Data
      1. Fetching JSON from a URL - dataset._data_url
      2. Fetching JSON from CSV/TSV from a URL - dataset._data_csv or dataset._data_tsv
      3. Fetching JSON from a custom JavaScript Promise - dataset._data_promise
    4. Transforming and Combining Multiple Data Sources - dataset.reducer

1. Static Training Data

The most straightforward way of defining your training data, is to assign your model training data directly to the dataset property. When using static data, the dataset property expects to be an array of JSON objects.

const JML = {
  type, // required, e.g. 'regression'
  inputs, // required, e.g. ['x']
  outputs, // required, e.g. ['y']
  dataset: [ //define static data
    {x:1,y:2,},
    {x:2,y:4,},
    {x:3,y:6,},
  ],
};
const Model = await getModel(JML); 

2. Using JDS JSON Data

Another way to set the training data for your model is to use JDS JSON in your dataset property. When getModel is called with JML and the dataset property is not an array of JSON objects, the dataset property is passed to the getDataSet function.

The getDataSet method expects to be called with an JDS (JSON DataSet) formatted Object.

2.1 JDS

JDS Objects are defined as:

export type JDS = {
  name?: string;
  id?: string;
  reducer?: Reducer;
  pre_transform?: string | genericFunction;
  post_transform?: string | genericFunction;
  data?: Data;
  _data_static?: Data;
  _data_url?: string;
  _data_csv?: string;
  _data_tsv?: string;
  _data_csv_options?: CSVOptions;
  _data_promise?: genericFunction;
}

2.2 JSON Training Data

The most simple use case would be just defining static data using JDS data or _static_data

import { getModel, getDataSet} from '@jsonstack/jsonm'
const JDS = {
  data:[ //define static data
    {x:1,y:2,},
    {x:2,y:4,},
    {x:3,y:6,},
  ]
};

const JDS2 = {
  _data_static:[ //define static data
    {x:1,y:2,},
    {x:2,y:4,},
    {x:3,y:6,},
  ]
};

const JML = {
  type, // required, e.g. 'regression'
  inputs, // required, e.g. ['x']
  outputs, // required, e.g. ['y']
  dataset: JDS || JDS2 || await getDataSet(JDS || JDS2 )
};

const Model = await getModel(JML); 

2.3 Dynamic Training Data

The primary use case for defining your training data with JDS JSON is because you need to fetch, combine and transform data from one or multiple sources.

2.3.1 Fetching JSON from a URL

JSONM can be used to fetch JSON from a remote location by defining _data_url

import { getModel, getDataSet} from '@jsonstack/jsonm'
const JDS = {
   _data_url: 'https://jsonplaceholder.typicode.com/posts'
};


const JML = {
  type, // required, e.g. 'regression'
  inputs, // required, e.g. ['x']
  outputs, // required, e.g. ['y']
  dataset: JDS
  /* resolves to:
    [
      {
        userId: 1,
        id: 1,
        title: "sunt auto"
      },
      {
        userId: 1,
        id: 2,
        title: "qui est esse",
        body: "est rerum"
      },
      ...
    ]
  */
};

const Model = await getModel(JML); 

2.3.2 Fetching JSON from CSV/TSV from a URL

JSONM can be used to fetch JSON from a remote CSV/TSV location by defining _data_csv or _data_tsv. Both TSVs and CSVs will accept additional loading options that are defined on _data_csv_options

import { getModel, getDataSet} from '@jsonstack/jsonm'
const JDS = {
  _data_csv:'https://raw.githubusercontent.com/repetere/modelx-model/master/src/test/mock/data/iris_data.csv',
  _data_csv_options:{} //options passed to csvtojson module
};

const JML = {
  type, // required, e.g. 'regression'
  inputs, // required, e.g. ['x']
  outputs, // required, e.g. ['y']
  dataset: JDS 
  /* resolves to:
    [
      {
        sepal_length_cm: 5.1, sepal_width_cm: 3.5, petal_length_cm: 1.4, petal_width_cm: 0.2, plant: 'Iris-setosa'
      },
      ...
    ]
  */
};

const Model = await getModel(JML); 

2.3.3 Fetching JSON from a custom JavaScript Promise

JSONM can be used to fetch JSON from any asynchronus function of Promise by defining _data_promise. This allows for JSONM to load data from user defined functions.

import { getModel, getDataSet} from '@jsonstack/jsonm'
const JDS = {
  _data_promise:new Promise((resolve,reject)=>{
    resolve([ 
      {x:1,y:2,},
      {x:2,y:4,},
      {x:3,y:6,},
    ])
  })
};

const JML = {
  type, // required, e.g. 'regression'
  inputs, // required, e.g. ['x']
  outputs, // required, e.g. ['y']
  dataset: JDS 
  /* resolves to:
    [ //define static data
      {x:1,y:2,},
      {x:2,y:4,},
      {x:3,y:6,},
    ]
  */
};

const Model = await getModel(JML); 

3. Combining Multiple Data Sources

JSONM can resolve multiple datasets objects into training data by using a reducer function. Reducer functions can pipe output from one function as the input to another function by defining multiple functions.

Reducers iterate over the JDS JSON objects defined in the reducer.datasets property. Reducers can be infinately nested for even more flexibility.

Reducer functions can be defined as either a string of an asynchronous function body that returns the data you want, or an asychronous function.

export type reducerFunction = (datasetData:DataSets) => Promise<Data>;

export type Reducer = {
  reducer_function: string | reducerFunction | Array<string|reducerFunction>;
  name?: string;
  context?: any;
  datasets: Array<JDS|Data>;
}

Reducers are a super powerful and flexible way to combine data from multiple sources. In order to reference datasets in your reducer functions, you can either explicitly set a name for the dataset or dataset_${index} name will be assigned automatically. Reducers can be used in the following way:

import { getModel, getDataSet} from '@jsonstack/jsonm'

function combineDataSets(datasets){
  return datasets.firstDS.map((datum,i)=> {
    return {
      ...datum,
      ...datasets.secondDS[i],
      combined_x: datum.x+datasets.secondDS[i].x2,
      combined_y: datum.y+datasets.secondDS[i].y2,
    }
  })
}
const JDS = {
  reducer:{
    reducer_function: combineDataSets,
    datasets:[
      {
        name:'firstDS',
        data:[
          {x:1,y:2,},
          {x:2,y:4,},
          {x:3,y:6,},
        ],
      },
      {
        name:'secondDS',
        data:[
          {x2:10,y2:20,},
          {x2:20,y2:40,},
          {x2:30,y2:60,},
        ],
      }
    ]
  }
};

const JML = {
  type, // required, e.g. 'regression'
  inputs, // required, e.g. ['x']
  outputs, // required, e.g. ['y']
  dataset: JDS 
  /* resolves to:
      { x:1, y:2, x2:10, y:20, combined_x:11, combined_y:22, },
      { x:2, y:4, x2:20, y:40, combined_x:22, combined_y:44, },
      { x:3, y:6, x2:30, y:60, combined_x:33, combined_y:66, },
    ]
  */
};

const Model = await getModel(JML); 

Next: Feature Engineering


JSONM Manual