Pivotal Greenplum is the only open source shared nothing, a massively parallel processing data warehouse that has been designed for business intelligence processing and advanced data analytics. Its enterprise-grade analytical database provides powerful and rapid analytics on very large volumes of data
Greenplum is an open source data warehouse that provides powerful and rapid analytics on very large volumes of data. Uniquely geared towards machine learning and advanced data science, Greenplum is powered by the world’s most advanced cost-based query optimiser, delivering unequalled analytical query performance on large data volumes, flexibility, a complete set of features and tight integration with leading analytical libraries and software stacks.
Static IT budgets, exploding data volumes and an ever-evolving competitive landscape have catalysed new ways of thinking about effective systems for data analytics in enterprises. Legacy data management solutions have not been able to scale to the volume of data and deliver the advanced analytical capabilities needed to address this new market reality. At the same time, proven massively parallel processing data warehouses have led to new approaches to effective data exploration and business insights.
KEY ARCHITECTURAL TENETS
Each server node in the Greenplum warehouse owns and manages a distinct portion of the overall data. The system automatically distributes data and parallelises query workloads across all available hardware, moving the processing dramatically closer to the data and its users, resulting in the delivering of maximum resource utilisation and incredible expressiveness.
The shared-nothing MPP architecture enables massive data storage, loading and processing with unlimited linear scalability. Adaptive services provide enterprises with high availability, workload management etc. Key product features enable petabyte-scale loading, polymorphic storage, comprehensive language, advanced machine learning support etc. Additionally, all major third-party analytical and administration tools are supported through standard client interfaces. Greenplum is regarded as the most scalable mission-critical analytical database and is used by a large number of leading enterprises worldwide.
Uniquely geared towards machine learning and advanced data science, Greenplum is powered by the world’s most advanced cost-based query optimiser
CORE CAPABILITIES DELIVER A FULLY FEATURED DATA WAREHOUSE
Greenplum incorporates several core capabilities that deliver extremely high query performance and throughput, reliable query completeness and correctness, as well as strong support for complex queries at petabyte scale data volumes with mixed workloads. Proven Open Source Technology: After a decade of software hardening, Pivotal made Greenplum available as an open source data warehouse called the “Greenplum Database”;
Massively Parallel Processing Architecture: The Pivotal Greenplum architecture provides automatic parallelisation of data and queries—all data is automatically partitioned across all nodes of the system, while queries are planned and executed using all nodes working together in a highly coordinated fashion. Petabyte-Scale Loading: High-performance loading uses MPP Scatter/Gather Streaming technology.
SOLVING BUSINESS PROBLEMS WITH PROVEN ANALYTICS
Comprehensive SQL support: Greenplum offers comprehensive SQL-92 and SQL-99 language support with SQL 2003 OLAP extensions. Advanced machine learning: Greenplum has some of the most advanced machine learning support among analytical databases in the industry. These capabilities are provided through Apache MADlib (incubating), an open source library for scalable in-database analytics extending the SQL capabilities on Greenplum through user-defined functions. Support for PL/*programmable analytics:
Greenplum enables users to implement functions in PL/Python, PL/Java, PL/R, PL/SQL, PL/Perl etc. that are executed in massively parallel mode. This enables powerful programmatic analytics capabilities to be executed natively at massive scale, as required by the use cases. Data Federation using GPHDFS: Greenplum supports data federation with all the major Hadoop distributions, enabling the use of a HDFS file system to create and update external tables, thereby minimising data movement. PostGIS support: Greenplum has extensive support for PostGIS, a spatial database extension for PostgreSQL that allows GIS (Geographic Information Systems) objects to be stored and processed in the database. The Greenplum PostGIS extension includes support for spatial indexes and functions for analysis and processing of GIS objects.
SECURITY AND BUSINESS CONTINUITY
Data Security: Security is a key consideration for ensuring enterprise policy and regulatory compliance for the data managed in analytical databases. Security can be categorised as authentication, authorisation, audit and data encryption. Fault-tolerance and data availability: Fault tolerance and data availability is achieved via a series of mechanisms, including Hardware Level RAID, software level mirroring and dual cluster mechanisms (for active-standby and active-active operation) and backup & restore.
Greenplum delivers flexibility, a complete set of features and tight integration with leading analytical libraries and software stacks.
SIMPLIFIED MANAGEMENT AND FLEXIBLE DEPLOYMENT
Greenplum Command Centre and Package Manager: Greenplum Command Centre monitors system performance metrics, analyses system health and allows administrators to perform management tasks, such as start, stop and recovery.
Flexible Deployment Model: Greenplum is available as part of the Pivotal Big Data Suite and supports multiple deployment models:
• Software: Packaged software distribution for integration with user provided commodity hardware running Linux OS.
• Appliance: EMC Data Computing Appliance (DCA) – fully integrated Hardware + Software solution, available in a range from a ¼ rack with four nodes to hundreds of nodes.
• Virtualized IaaS: In a virtualised compute + storage environment, flexibility in deployment models caters to multiple enterprise considerations around cost, performance, control, security, regulatory requirements etc.
KEY BENEFITS OF GREENPLUM
• Proven open source technology hardened over ten years
• Leverage existing SQL tools and skills
• Off-load and takeover workloads with the best cost, performance and scale
• Leverage advanced analytics to solve business problems
• Implementation of business critical analytical use cases with robust security and business continuity
• Leverage data federation with all major Hadoop distributions to build end-end use cases
• Leverage flexible deployment models to address enterprise needs
• Leverage flexible licensing as part of a larger Pivotal Big Data Suite
FEATURES OF GREENPLUM
• Shared-nothing, massively parallel processing data warehouse
• Fastest MPP data ingest platform
• Full SQL and ACID compliance
• Pivotal Query Optimiser supporting complex, interactive queries at big data volumes
• Polymorphic storage and multi-level partitioning for flexible and efficient queries
• Advanced analytics capabilities
• Data federation with all major Hadoop distributions
• Enterprise-grade security and business continuity
• Flexible deployment options
Ingram Micro doo Beograd,
11070 Novi Beograd, Tošin bunar 272V
+381 (65) 20 42 099
vmware-serbia@ingrammicro.com