Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable.
- HBase provides random access and strong consistency for large amounts of unstructured and semi structured data in a schemaless database organized by column families(data stored in the row is grouped together and is so called as column family).
- Schemaless Database
- Hbase can handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop ecosystem.
- This allows low latency allowing human-unnoticeable delays between an input being processed and the corresponding output providing real time characteristics.and increased elasticity in performance and cost choices.
- This enables customers to build interactive websites that work with large datasets, to build services that store sensor and telemetry data from millions of end points, and to analyze this data with Hadoop jobs.
- The HDInsight implementation leverages the scale-out architecture of HBase to provide automatic sharding (Sharding is a type of database partitioning that separates very large databases the into smaller, faster, more easily managed parts called data shards. The word shard means a small part of a whole.) of tables.
- Strong consistency for reads and writes, and automatic failover. Performance is enhanced by in-memory caching for reads and high-throughput streaming for writes.
How is data managed in HDInsight HBase?
Data can be managed in HBase by using the
scancommands from the HBase shell. Data is written to the database by using
putand read by using
scancommand is used to obtain data from multiple rows in a table. Data can also be managed using the HBase C# API, which provides a client library on top of the HBase REST API. An HBase database can also be queried by using Hive.