JSONM Manual
- Getting Started
- Working With Data
- Working With Models
- Advanced Topics
Data fetching
JSONM is designed to let you describe how to load model training data into your TensorFlow model via JSON and uses @jsonstack/data to construct training data for your model. The training data for your model can come from one or many data sources. JSONM also exposes the getDataSet
asynchronous function that can be used to retrieve data independently.
- Static Training Data - defining training data directly on your JML JSON Object on the
dataset
property - Using JDS JSON Data - getModel’s
dataset
property can also be defined as a JDS JSON Object that returns training data from callinggetDataSet
- JDS - JSON DataSet JSON spec
- JSON Training Data -
dataset.data
ordataset._static_data
- Dynamic Training Data
- Fetching JSON from a URL -
dataset._data_url
- Fetching JSON from CSV/TSV from a URL -
dataset._data_csv
ordataset._data_tsv
- Fetching JSON from a custom JavaScript Promise -
dataset._data_promise
- Fetching JSON from a URL -
- Transforming and Combining Multiple Data Sources -
dataset.reducer
1. Static Training Data
The most straightforward way of defining your training data, is to assign your model training data directly to the dataset
property. When using static data, the dataset
property expects to be an array of JSON objects.
const JML = {
type, // required, e.g. 'regression'
inputs, // required, e.g. ['x']
outputs, // required, e.g. ['y']
dataset: [ //define static data
{x:1,y:2,},
{x:2,y:4,},
{x:3,y:6,},
],
};
const Model = await getModel(JML);
2. Using JDS JSON Data
Another way to set the training data for your model is to use JDS JSON in your dataset
property. When getModel
is called with JML
and the dataset
property is not an array of JSON objects, the dataset
property is passed to the getDataSet
function.
The getDataSet
method expects to be called with an JDS (JSON DataSet) formatted Object.
2.1 JDS
JDS Objects are defined as:
export type JDS = {
name?: string;
id?: string;
reducer?: Reducer;
pre_transform?: string | genericFunction;
post_transform?: string | genericFunction;
data?: Data;
_data_static?: Data;
_data_url?: string;
_data_csv?: string;
_data_tsv?: string;
_data_csv_options?: CSVOptions;
_data_promise?: genericFunction;
}
2.2 JSON Training Data
The most simple use case would be just defining static data using JDS data
or _static_data
import { getModel, getDataSet} from '@jsonstack/jsonm'
const JDS = {
data:[ //define static data
{x:1,y:2,},
{x:2,y:4,},
{x:3,y:6,},
]
};
const JDS2 = {
_data_static:[ //define static data
{x:1,y:2,},
{x:2,y:4,},
{x:3,y:6,},
]
};
const JML = {
type, // required, e.g. 'regression'
inputs, // required, e.g. ['x']
outputs, // required, e.g. ['y']
dataset: JDS || JDS2 || await getDataSet(JDS || JDS2 )
};
const Model = await getModel(JML);
2.3 Dynamic Training Data
The primary use case for defining your training data with JDS JSON is because you need to fetch, combine and transform data from one or multiple sources.
2.3.1 Fetching JSON from a URL
JSONM can be used to fetch JSON from a remote location by defining _data_url
import { getModel, getDataSet} from '@jsonstack/jsonm'
const JDS = {
_data_url: 'https://jsonplaceholder.typicode.com/posts'
};
const JML = {
type, // required, e.g. 'regression'
inputs, // required, e.g. ['x']
outputs, // required, e.g. ['y']
dataset: JDS
/* resolves to:
[
{
userId: 1,
id: 1,
title: "sunt auto"
},
{
userId: 1,
id: 2,
title: "qui est esse",
body: "est rerum"
},
...
]
*/
};
const Model = await getModel(JML);
2.3.2 Fetching JSON from CSV/TSV from a URL
JSONM can be used to fetch JSON from a remote CSV/TSV location by defining _data_csv
or _data_tsv
. Both TSVs and CSVs will accept additional loading options that are defined on _data_csv_options
import { getModel, getDataSet} from '@jsonstack/jsonm'
const JDS = {
_data_csv:'https://raw.githubusercontent.com/repetere/modelx-model/master/src/test/mock/data/iris_data.csv',
_data_csv_options:{} //options passed to csvtojson module
};
const JML = {
type, // required, e.g. 'regression'
inputs, // required, e.g. ['x']
outputs, // required, e.g. ['y']
dataset: JDS
/* resolves to:
[
{
sepal_length_cm: 5.1, sepal_width_cm: 3.5, petal_length_cm: 1.4, petal_width_cm: 0.2, plant: 'Iris-setosa'
},
...
]
*/
};
const Model = await getModel(JML);
2.3.3 Fetching JSON from a custom JavaScript Promise
JSONM can be used to fetch JSON from any asynchronus function of Promise by defining _data_promise
. This allows for JSONM to load data from user defined functions.
import { getModel, getDataSet} from '@jsonstack/jsonm'
const JDS = {
_data_promise:new Promise((resolve,reject)=>{
resolve([
{x:1,y:2,},
{x:2,y:4,},
{x:3,y:6,},
])
})
};
const JML = {
type, // required, e.g. 'regression'
inputs, // required, e.g. ['x']
outputs, // required, e.g. ['y']
dataset: JDS
/* resolves to:
[ //define static data
{x:1,y:2,},
{x:2,y:4,},
{x:3,y:6,},
]
*/
};
const Model = await getModel(JML);
3. Combining Multiple Data Sources
JSONM can resolve multiple datasets objects into training data by using a reducer function. Reducer functions can pipe output from one function as the input to another function by defining multiple functions.
Reducers iterate over the JDS JSON objects defined in the reducer.datasets
property. Reducers can be infinately nested for even more flexibility.
Reducer functions can be defined as either a string of an asynchronous function body that returns the data you want, or an asychronous function.
export type reducerFunction = (datasetData:DataSets) => Promise<Data>;
export type Reducer = {
reducer_function: string | reducerFunction | Array<string|reducerFunction>;
name?: string;
context?: any;
datasets: Array<JDS|Data>;
}
Reducers are a super powerful and flexible way to combine data from multiple sources. In order to reference datasets in your reducer functions, you can either explicitly set a name for the dataset or dataset_${index}
name will be assigned automatically. Reducers can be used in the following way:
import { getModel, getDataSet} from '@jsonstack/jsonm'
function combineDataSets(datasets){
return datasets.firstDS.map((datum,i)=> {
return {
...datum,
...datasets.secondDS[i],
combined_x: datum.x+datasets.secondDS[i].x2,
combined_y: datum.y+datasets.secondDS[i].y2,
}
})
}
const JDS = {
reducer:{
reducer_function: combineDataSets,
datasets:[
{
name:'firstDS',
data:[
{x:1,y:2,},
{x:2,y:4,},
{x:3,y:6,},
],
},
{
name:'secondDS',
data:[
{x2:10,y2:20,},
{x2:20,y2:40,},
{x2:30,y2:60,},
],
}
]
}
};
const JML = {
type, // required, e.g. 'regression'
inputs, // required, e.g. ['x']
outputs, // required, e.g. ['y']
dataset: JDS
/* resolves to:
{ x:1, y:2, x2:10, y:20, combined_x:11, combined_y:22, },
{ x:2, y:4, x2:20, y:40, combined_x:22, combined_y:44, },
{ x:3, y:6, x2:30, y:60, combined_x:33, combined_y:66, },
]
*/
};
const Model = await getModel(JML);
Next: Feature Engineering
JSONM Manual
- Getting Started
- Working With Data
- Working With Models
- Advanced Topics