public class DatasetFactory extends Object
Dataset construction is based on three major concepts: a partition upstream
, context
and
data
. A partition upstream
is a data source, which assumed to be available all the time regardless
node failures and rebalancing events. A partition context
is a part of a partition maintained during the
whole computation process and stored in a reliable storage so that a context
is staying available and
consistent regardless node failures and rebalancing events as well as an upstream
. A partition data
is a part of partition maintained during a computation process in unreliable local storage such as heap, off-heap or
GPU memory on the node where current computation is performed, so that partition data
can be lost as result
of node failure or rebalancing, but it can be restored from an upstream
and a partition context
.
A partition context
and data
are built on top of an upstream
by using specified
builders: PartitionContextBuilder
and PartitionDataBuilder
correspondingly. To build a generic
dataset the following approach is used:
Dataset<C, D> dataset = DatasetFactory.create( ignite, cache, partitionContextBuilder, partitionDataBuilder );
As well as the generic building method create
this factory provides methods that allow to create a
specific dataset types such as method createSimpleDataset
to create SimpleDataset
and method createSimpleLabeledDataset
to create SimpleLabeledDataset
.
Dataset
,
PartitionContextBuilder
,
PartitionDataBuilder
Constructor and Description |
---|
DatasetFactory() |
Modifier and Type | Method and Description |
---|---|
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder,
LearningEnvironment environment)
Creates a new instance of distributed dataset using the specified
partCtxBuilder and partDataBuilder . |
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(DatasetBuilder<K,V> datasetBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder)
Creates a new instance of distributed dataset using the specified
partCtxBuilder and partDataBuilder . |
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(Ignite ignite,
IgniteCache<K,V> upstreamCache,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder,
LearningEnvironment environment)
Creates a new instance of distributed dataset using the specified
partCtxBuilder and partDataBuilder . |
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(Ignite ignite,
IgniteCache<K,V> upstreamCache,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder)
Creates a new instance of distributed dataset using the specified
partCtxBuilder and partDataBuilder . |
static <K,V,C extends Serializable,D extends AutoCloseable> |
create(Map<K,V> upstreamMap,
LearningEnvironmentBuilder envBuilder,
int partitions,
PartitionContextBuilder<K,V,C> partCtxBuilder,
PartitionDataBuilder<K,V,C,D> partDataBuilder,
LearningEnvironment environment)
Creates a new instance of local dataset using the specified
partCtxBuilder and partDataBuilder . |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleDataset(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified partCtxBuilder and featureExtractor . |
static <K,V,CO extends Serializable> |
createSimpleDataset(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified featureExtractor . |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleDataset(Ignite ignite,
IgniteCache<K,V> upstreamCache,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified partCtxBuilder and featureExtractor . |
static <K,V,CO extends Serializable> |
createSimpleDataset(Ignite ignite,
IgniteCache<K,V> upstreamCache,
LearningEnvironmentBuilder envBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified featureExtractor . |
static <K,V,CO extends Serializable> |
createSimpleDataset(Ignite ignite,
IgniteCache<K,V> upstreamCache,
Preprocessor<K,V> featureExtractor)
Creates a new instance of distributed
SimpleDataset using the specified featureExtractor . |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleDataset(Map<K,V> upstreamMap,
int partitions,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of local
SimpleDataset using the specified partCtxBuilder and featureExtractor . |
static <K,V,CO extends Serializable> |
createSimpleDataset(Map<K,V> upstreamMap,
int partitions,
LearningEnvironmentBuilder envBuilder,
Preprocessor<K,V> featureExtractor)
Creates a new instance of local
SimpleDataset using the specified featureExtractor . |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleLabeledDataset(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> vectorizer)
Creates a new instance of distributed
SimpleLabeledDataset using the specified partCtxBuilder ,
featureExtractor and lbExtractor . |
static <K,V,CO extends Serializable> |
createSimpleLabeledDataset(DatasetBuilder<K,V> datasetBuilder,
LearningEnvironmentBuilder envBuilder,
Preprocessor<K,V> vectorizer)
Creates a new instance of distributed
SimpleLabeledDataset using the specified featureExtractor
and lbExtractor . |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleLabeledDataset(Ignite ignite,
IgniteCache<K,V> upstreamCache,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> vectorizer)
Creates a new instance of distributed
SimpleLabeledDataset using the specified partCtxBuilder ,
featureExtractor and lbExtractor . |
static <K,V,CO extends Serializable> |
createSimpleLabeledDataset(Ignite ignite,
LearningEnvironmentBuilder envBuilder,
IgniteCache<K,V> upstreamCache,
Preprocessor<K,V> vectorizer)
Creates a new instance of distributed
SimpleLabeledDataset using the specified featureExtractor
and lbExtractor . |
static <K,V,C extends Serializable,CO extends Serializable> |
createSimpleLabeledDataset(Map<K,V> upstreamMap,
int partitions,
LearningEnvironmentBuilder envBuilder,
PartitionContextBuilder<K,V,C> partCtxBuilder,
Preprocessor<K,V> vectorizer)
Creates a new instance of local
SimpleLabeledDataset using the specified partCtxBuilder , featureExtractor and lbExtractor . |
static <K,V,CO extends Serializable> |
createSimpleLabeledDataset(Map<K,V> upstreamMap,
LearningEnvironmentBuilder envBuilder,
int partitions,
Preprocessor<K,V> vectorizer)
Creates a new instance of local
SimpleLabeledDataset using the specified featureExtractor and
lbExtractor . |
public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder, LearningEnvironment environment)
partCtxBuilder
and partDataBuilder
. This is the generic methods that allows to create any Ignite Cache based datasets with any
desired partition context
and data
.K
- Type of a key in upstream
data.V
- ype of a value in upstream
data.C
- Type of a partition context
.D
- Type of a partition data
.envBuilder
- Learning environment builder.datasetBuilder
- Dataset builder.partCtxBuilder
- Partition context
builder.partDataBuilder
- Partition data
builder.environment
- Local learning environment.public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(DatasetBuilder<K,V> datasetBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder)
partCtxBuilder
and partDataBuilder
. This is the generic methods that allows to create any Ignite Cache based datasets with any
desired partition context
and data
.K
- Type of a key in upstream
data.V
- ype of a value in upstream
data.C
- Type of a partition context
.D
- Type of a partition data
.datasetBuilder
- Dataset builder.partCtxBuilder
- Partition context
builder.partDataBuilder
- Partition data
builder.public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(Ignite ignite, IgniteCache<K,V> upstreamCache, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder, LearningEnvironment environment)
partCtxBuilder
and partDataBuilder
. This is the generic methods that allows to create any Ignite Cache based datasets with any
desired partition context
and data
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.C
- Type of a partition context
.D
- Type of a partition data
.ignite
- Ignite instance.upstreamCache
- Ignite Cache with upstream
data.envBuilder
- Learning environment builder.partCtxBuilder
- Partition context
builder.partDataBuilder
- Partition data
builder.environment
- Local learning environment.public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(Ignite ignite, IgniteCache<K,V> upstreamCache, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder)
partCtxBuilder
and partDataBuilder
. This is the generic methods that allows to create any Ignite Cache based datasets with any
desired partition context
and data
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.C
- Type of a partition context
.D
- Type of a partition data
.ignite
- Ignite instance.upstreamCache
- Ignite Cache with upstream
data.partCtxBuilder
- Partition context
builder.partDataBuilder
- Partition data
builder.public static <K,V,C extends Serializable,CO extends Serializable> SimpleDataset<C> createSimpleDataset(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset
using the specified partCtxBuilder
and featureExtractor
. This methods determines partition data
to be SimpleDatasetData
, but allows to
use any desired type of partition context
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.C
- Type of a partition context
.datasetBuilder
- Dataset builder.envBuilder
- Learning environment builder.partCtxBuilder
- Partition context
builder.featureExtractor
- Feature extractor used to extract features and build SimpleDatasetData
.public static <K,V,C extends Serializable,CO extends Serializable> SimpleDataset<C> createSimpleDataset(Ignite ignite, IgniteCache<K,V> upstreamCache, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset
using the specified partCtxBuilder
and featureExtractor
. This methods determines partition data
to be SimpleDatasetData
, but allows to
use any desired type of partition context
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.C
- Type of a partition context
.ignite
- Ignite instance.upstreamCache
- Ignite Cache with upstream
data.envBuilder
- Learning environment builder.partCtxBuilder
- Partition context
builder.featureExtractor
- Feature extractor used to extract features and build SimpleDatasetData
.public static <K,V,C extends Serializable,CO extends Serializable> SimpleLabeledDataset<C> createSimpleLabeledDataset(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset
using the specified partCtxBuilder
,
featureExtractor
and lbExtractor
. This method determines partition data
to be SimpleLabeledDatasetData
, but allows to use any desired type of partition context
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.C
- Type of a partition context
.datasetBuilder
- Dataset builder.envBuilder
- Learning environment builder.partCtxBuilder
- Partition context
builder.vectorizer
- Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData
.public static <K,V,C extends Serializable,CO extends Serializable> SimpleLabeledDataset<C> createSimpleLabeledDataset(Ignite ignite, IgniteCache<K,V> upstreamCache, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset
using the specified partCtxBuilder
,
featureExtractor
and lbExtractor
. This method determines partition data
to be SimpleLabeledDatasetData
, but allows to use any desired type of partition context
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.C
- Type of a partition context
.ignite
- Ignite instance.upstreamCache
- Ignite Cache with upstream
data.envBuilder
- Learning environment builder.partCtxBuilder
- Partition context
builder.vectorizer
- Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData
.public static <K,V,CO extends Serializable> SimpleDataset<EmptyContext> createSimpleDataset(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset
using the specified featureExtractor
. This
methods determines partition context
to be EmptyContext
and partition data
to be SimpleDatasetData
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.datasetBuilder
- Dataset builder.envBuilder
- Learning environment builder.featureExtractor
- Feature extractor used to extract features and build SimpleDatasetData
.public static <K,V,CO extends Serializable> SimpleDataset<EmptyContext> createSimpleDataset(Ignite ignite, IgniteCache<K,V> upstreamCache, LearningEnvironmentBuilder envBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset
using the specified featureExtractor
. This
methods determines partition context
to be EmptyContext
and partition data
to be SimpleDatasetData
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.ignite
- Ignite instance.upstreamCache
- Ignite Cache with upstream
data.envBuilder
- Learning environment builder.featureExtractor
- Feature extractor used to extract features and build SimpleDatasetData
.public static <K,V,CO extends Serializable> SimpleDataset<EmptyContext> createSimpleDataset(Ignite ignite, IgniteCache<K,V> upstreamCache, Preprocessor<K,V> featureExtractor)
SimpleDataset
using the specified featureExtractor
. This
methods determines partition context
to be EmptyContext
and partition data
to be SimpleDatasetData
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.ignite
- Ignite instance.upstreamCache
- Ignite Cache with upstream
data.featureExtractor
- Feature extractor used to extract features and build SimpleDatasetData
.public static <K,V,CO extends Serializable> SimpleLabeledDataset<EmptyContext> createSimpleLabeledDataset(DatasetBuilder<K,V> datasetBuilder, LearningEnvironmentBuilder envBuilder, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset
using the specified featureExtractor
and lbExtractor
. This methods determines partition context
to be EmptyContext
and
partition data
to be SimpleLabeledDatasetData
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.datasetBuilder
- Dataset builder.envBuilder
- Learning environment builder.vectorizer
- Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData
.public static <K,V,CO extends Serializable> SimpleLabeledDataset<EmptyContext> createSimpleLabeledDataset(Ignite ignite, LearningEnvironmentBuilder envBuilder, IgniteCache<K,V> upstreamCache, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset
using the specified featureExtractor
and lbExtractor
. This methods determines partition context
to be EmptyContext
and
partition data
to be SimpleLabeledDatasetData
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.ignite
- Ignite instance.upstreamCache
- Ignite Cache with upstream
data.envBuilder
- Learning environment builder.vectorizer
- Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData
.public static <K,V,C extends Serializable,D extends AutoCloseable> Dataset<C,D> create(Map<K,V> upstreamMap, LearningEnvironmentBuilder envBuilder, int partitions, PartitionContextBuilder<K,V,C> partCtxBuilder, PartitionDataBuilder<K,V,C,D> partDataBuilder, LearningEnvironment environment)
partCtxBuilder
and partDataBuilder
.
This is the generic methods that allows to create any Ignite Cache based datasets with any desired partition
context
and data
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.C
- Type of a partition context
.D
- Type of a partition data
.upstreamMap
- Map
with upstream
data.partitions
- Number of partitions upstream
Map
will be divided on.partCtxBuilder
- Partition context
builder.envBuilder
- Learning environment builder.partDataBuilder
- Partition data
builder.environment
- Local learning environment.public static <K,V,C extends Serializable,CO extends Serializable> SimpleDataset<C> createSimpleDataset(Map<K,V> upstreamMap, int partitions, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset
using the specified partCtxBuilder
and featureExtractor
. This methods determines partition data
to be SimpleDatasetData
, but allows to
use any desired type of partition context
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.C
- Type of a partition context
.upstreamMap
- Map
with upstream
data.partitions
- Number of partitions upstream
Map
will be divided on.envBuilder
- Learning environment builder.partCtxBuilder
- Partition context
builder.featureExtractor
- Feature extractor used to extract features and build SimpleDatasetData
.public static <K,V,C extends Serializable,CO extends Serializable> SimpleLabeledDataset<C> createSimpleLabeledDataset(Map<K,V> upstreamMap, int partitions, LearningEnvironmentBuilder envBuilder, PartitionContextBuilder<K,V,C> partCtxBuilder, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset
using the specified partCtxBuilder
, featureExtractor
and lbExtractor
. This method determines partition data
to be SimpleLabeledDatasetData
, but allows to use any desired type of partition context
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.C
- Type of a partition context
.upstreamMap
- Map
with upstream
data.partitions
- Number of partitions upstream
Map
will be divided on.envBuilder
- Learning environment builder.partCtxBuilder
- Partition context
builder.vectorizer
- Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData
.public static <K,V,CO extends Serializable> SimpleDataset<EmptyContext> createSimpleDataset(Map<K,V> upstreamMap, int partitions, LearningEnvironmentBuilder envBuilder, Preprocessor<K,V> featureExtractor)
SimpleDataset
using the specified featureExtractor
. This methods
determines partition context
to be EmptyContext
and partition data
to be SimpleDatasetData
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.upstreamMap
- Map
with upstream
data.partitions
- Number of partitions upstream
Map
will be divided on.envBuilder
- Learning environment builder.featureExtractor
- Feature extractor used to extract features and build SimpleDatasetData
.public static <K,V,CO extends Serializable> SimpleLabeledDataset<EmptyContext> createSimpleLabeledDataset(Map<K,V> upstreamMap, LearningEnvironmentBuilder envBuilder, int partitions, Preprocessor<K,V> vectorizer)
SimpleLabeledDataset
using the specified featureExtractor
and
lbExtractor
. This methods determines partition context
to be EmptyContext
and partition
data
to be SimpleLabeledDatasetData
.K
- Type of a key in upstream
data.V
- Type of a value in upstream
data.upstreamMap
- Map
with upstream
data.partitions
- Number of partitions upstream
Map
will be divided on.envBuilder
- Learning environment builder.vectorizer
- Upstream vectorizer used to extract features and labels and build SimpleLabeledDatasetData
.
GridGain In-Memory Computing Platform : ver. 8.9.14 Release Date : November 5 2024