Apache Drill

Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage

Free

Description

Apache Drill offers a flexible and high-performance solution for querying diverse data sources directly using standard SQL. It eliminates the need for traditional data preparation steps like schema creation, data loading, and transformations, enabling users to gain faster insights from raw, multi-structured, or nested data. Drill connects to a wide range of datastores including HBase, MongoDB, HDFS, S3, Azure Blob Storage, and Google Cloud Storage, even allowing joins across different sources within a single query.

Leveraging a powerful JSON data model and a columnar execution engine optimized for complex data, Drill provides speed and flexibility. It integrates seamlessly with existing Business Intelligence tools (like Tableau, Qlik, MicroStrategy, Spotfire, Excel) through standard JDBC/ODBC drivers and offers a REST API for custom application development. Its distributed, scalable architecture allows deployment from a single laptop to large clusters, optimizing performance through techniques like data locality and advanced query compilation.

Key Features

Schema-Free Querying: Query data without needing predefined schemas.
Broad Data Source Support: Connects to NoSQL (HBase, MongoDB), Hadoop (HDFS), Cloud Storage (S3, Azure Blob, GCS, Swift), NAS, and local files.
Cross-Datastore Joins: Combine data from multiple sources (e.g., MongoDB and Hadoop) in one query.
JSON Data Model: Handles complex, nested, and evolving data structures.
Columnar Execution Engine: Delivers high performance, even with complex data types.
Standard SQL Support: Leverage existing SQL skills and tools.
BI Tool Integration: Connects via JDBC/ODBC drivers to tools like Tableau, Qlik, MicroStrategy, Spotfire, Excel.
REST API: Enables integration with custom applications.
Datastore-Aware Optimization: Pushes processing down to the source system where possible.
Scalability: Runs on a single machine or scales out to large clusters (1000s of nodes).
Data Locality Awareness: Optimizes performance by processing data close to its storage location.