v1.3.1

Introduction

Ampool 1.3.1 is the first general release of Ampool targeted towards users, who are looking for analytics from data generated from their Apps. It is designed to sit closer to applications and/or compute clusters, serving data for both analytics and low-latency applications.

What's New in 1.3.1

With this release, a new immutable data abstraction, FTable, is introduced. Following are the highlights of this release:

Core data store

  • FTable: A new table type that supports Create, Append and Scan operations
  • Local tier store for FTable to 'tier' data on Ampool cluster local disks when data volume exceeds available memory tier capacity.
  • Support for HDFS as tier store for FTable to 'tier' data
  • Support for S3 as tier store for FTable.

Interfaces & APIs

  • CLI (MASH): Added support for FTable for basic operations - append & scan. Also, top-level commands are re-factored to use generic 'table' instead of using 'mtable'. For a complete list, please refer to Table Command Reference.

Connectors

  • Spark: Updated to use Scala 2.11 and Spark 2.1.0 versions respectively
  • Hive: Added option to specify data for FTable, which is now the default table type
  • Kafka: ampool-connect-kafka is a Kafka Sink-Connector for storing data from kafka (topics) to corresponding ampool table (MTable)

Besides, several internal enhancements were made to improve performance for insert, update & scan operations, and efficient data recovery for larger datasets.

Package Contents

Following packages are available for download from Ampool's (S3) website:

  • Ampool Base Package (ampool-1.3.1.tar.gz): Includes Ampool core (MTable, FTable, CoProcessors, Local DiskStore for recovery, etc) and core interfaces (MASH & Java API)

  • Ampool Compute Connectors:

  • Spark (ampool-spark-1.3.1.tar.gz)

  • Hive (ampool-hive-1.3.1.tar.gz)

  • Kafka (ampool-connect-kafka-1.3.1.tar.gz)

Installing Ampool v1.3.1

Core Ampool Server & Locator

  • Untar the binaries in a new installation directory. After extracting the contents from the package, you should see the following directory structure:
bin
config
docs
examples
lib
tools
  • For launching ampool services, start the command-line utility MASH (Memory Analytics Shell) by typing the following from the installed ampool directory (ampool-home):
$ <ampool-home>/bin/mash
mash>

Type 'help' for a list of commands.

For a detailed explanation of ampool services and commands, please refer to the README within the main directory.

Connectors

  • Untar the Ampool connector packages on the Ampool client nodes (ampool-spark-1.3.1.tar.gz, ampool-hive-1.3.1.tar.gz, ampool-connect-kafka-1.3.1.tar.gz)

  • Refer to README in the respective packages to install and use the Ampool connectors with Spark and Hive.

Upgrading to Ampool v1.3.1

Core Ampool Server & Locator

  • Stop the existing Ampool Server(s) and Locator(s) using Mash CLI

  • Untar the new Ampool core package and start the Ampool Server and Locators using new binaries.

  • Make sure provide the previous version's server and locator directories using --dir option in Mash CLI when starting the Server and Locators.

Connectors

  • Untar the newer verison of connector packages and use them in place of previous verion in the classpath when using Spark, Hive and Kafka with newer version of Ampool

Resolved Issues

Issue Ref Description
GEN-1047 Feature Request : MASH CLI command to export the MTable data_
GEN-1565 describe table for Ftable mentions MTable._
GEN-1541 ampool-core package has incorrect naming of jars._
GEN-1397 FTable: Support for Expiration policy for tier-0 and tier-1._
GEN-1521 Spark-Ampool connector: Support version Spark 2.1 and Scala 2.11._
GEN-1554 Disable the check for existence of same key within a batch._
GEN-1323 Getting the reference to MTable/FTable from Client Cache is costly, need to cache the instance._
GEN-1414 Kafka connector to injest data into Ampool MTable._
GEN-1499 Integrate SharedStore as tiere 1 store._
GEN-1199 MTable.checkAndPut() define and implement API behavior for not existing rowKey._
GEN-1392 FTable: Design and Develop HDFS as archive tier._
GEN-1479 TierStore : Store implementation for local disk and HDFS._
GEN-1393 FTable: Support (insert) time based partitioning for local disk (ORC) tier._
GEN-1153 Mash mscan on unordered MTable._
GEN-1333 FTable: Implement Hive Connector._
GEN-1235 For a persisted MTable, the number of records returned by a scan after cluster restart may differ from the number of records returned by scan before cluster restart.
GEN-1228 Scan operation may fail if a failover happens while scan is running.

Known Issues & Limitations

Issue Ref Description Workaround (if any)
GEN-1161 On local developer machines, if ampool services are started without specifying the host, it may bind to wifi address, which changes with moving location. In such scenarios, reconnecting to the locator from MASH fails. Manually kill ampool (locator and server) processes and restart the services. Alternatively, specify localhost or stable network interface to bind these services.
Limitation Coprocessor can not be called on an empty table. Endpoint coprocessor execution is not supported on empty table, To check for empty table use, MTable.isEmpty() API.
GEN-1144 runExamples script with already running ampool cluster fails. Run the runExamples script after stopping the ampool cluster.

Versions & Compatibility

This distribution is based on Apache Geode release (1.0.0-incubating.M3). Following table summarizes the minimum versions supported for different connectors:

Connector Version
Apache Spark 2.1.0
Scala 2.11
Apache Hive 0.14.0, 1.2.1
Apache Kafka 0.10.0.1/confluent-3.0.0
  • Code examples: A set of code samples showcasing table and coprocessor API can be found under <installation_dir>/examples folder.