Control Systems and Computers, N1, 2018, Article 6

DOI: https://doi.org/10.15407/usim.2018.01.057

Upr. sist. maš., 2018, Issue 1 (273), pp. 58-71.

UDC 004.65:004.7:004.75:004.738.5

Oursatyev Alexey A., PhD in Techn. Sciences, Leading Research Associate, International Research and Training Centre of Information Technologies and Systems of the NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03187, Ukraine, aleksei@irtc.org.ua

Big Data. Analytical Databases and Warehouse: Vertica, Kdb

Introduction. The article is a continuation of the research on the Great Data and the toolkit that transforms into a new generation of technologies and architectures of databases platforms and Warehouse for the intelligent output. The progressive industrial developments of the world-famous IT companies, including some specialized hardware-software storage and processing systems described in Hinchcliffe Dion, are presented. The opportunity of Big Data: the closing of the “clue gap” are presented, the changes in the infrastructure, tools and platforms for obtaining the necessary information and new knowledge of the Big Data are analysed. The material is presented in such a way that the focus is on the transformation of a known database environment, databases or Data Warehouse for intelligent output, and initial information about a specific product is given in the general product characteristics. The second part of the review is represented by the database Vertica and Kdb.

Purpose. It is necessary to consider the infrastructure solutions for the new analytics-oriented developments and evaluate the effectiveness of their application in the studies of the Great Data for new knowledge, the discovery of implicit connections and in-depth understanding.

Methods. Information-analytical methods and technologies of data processing, the methods of their estimation and forecasting, taking into account the development of the most important branches of informatics and information technologies.

Results. The column-oriented DBMS Vertica, based on the MPP architecture, is designed to work in a horizontally scalable environment.
From version 7.0, the HP Vertica Analytics Platform integrated into its platform ecosystem products consisting of Apache Hadoop and a number of the additional modules, which allowed to create a special area of storage and processing -Flex Zone, based on flex or flexible tables. Flex Zone has assumed the unstructured or fuzzy (of schema-less data) data without the NoSQL intermediary. Flex Zone imposes minimal structuring, basically interpreting raw data as a series of key-value pairs. From it the raw data can be requested using  SQL, either directly, or through BI  tools. It provides useful, though inaccurate, reading of data. HPE Vertica is currently based on the Vertica DBMS core and offers the integration with Hadoop for analytics – SQL for Hadoop. The Hadoop stack uses the Apache Spark framework with the Spark SQL based on its Catalyst  and Tungsten optimizers. The first supports cost-based optimization, the second-Tungsten aims to increase memory efficiency and processor performance for Spark  applications to bring performance up to the limits of state-of-the-art equipment.
Kdb High-performance column-oriented DB company Kx Systems uses its own programming language K. This language is implemented on the paradigms of matrix and functional programming, which provides a compact and fast processing of data arrays. It is Oriented to the work with mathematical analysis and financial forecasting, and it is designed for databases and financial applications. Later, Kx additionally inserts a database of historical data and releases a 64-bit version of the kdb database called kdb +, which includes the language Q, which combines the capabilities of K and ksql – SQL-like query language. Currently only Q is propagated and commercialized as kdb +.
The database belongs to the type of in-memory database. Kdb + is suitable for data processing in memory and on a single position disk. The same architecture is used for real-time data and retrospective data.
At present, Kx Systems proposes its Kx for DaaS solution as a dynamic delivery or data provision service upon request. The Kx solution is a modern real-time data and analytics platform providing a set of tools for managing data from the moment they are received to the consumption by several parties, coordinated and controlled.
The Kx solutions are discussed in comparison with the HTTP Gartner architecture, which allows the applications to analyze data directly from their receipt and update the functions of transaction processing. In this case, it is possible both to transact the processing, and to expand analyst, to make decisions in real time using hybrid streaming and computing in RAM. Analysis – forecasting and modeling, becomes an integral part of the process, rather than being positioned as a separate action.

Conclusion. According to analysts, Gartner HPE Vertica focuses on the main market trends, supporting the analysis of large data in the cloud, logical Warehouse LDW (from Vertica SQL to Hadoop). Data management offers a promising set of tools, but they are separated from the placement of Vertica, demonstrating a fragmented strategy between these two solutions. As Gartner points out, HPE Vertica is widely used for different application cases and types of data and is different from other submitted solutions by fast response to a request.

Gartner believes that individual HTAP systems will contribute to logical or physical storage, but will not completely replace them. At the same time, the data warehousing architecture will remain necessary to support the extended analysis that contains a large amount of historical data or large data coming from internal and external, structured and unstructured sources.

Keywords: MPP – architecture, HTAP – hybrid transactional/analytical processing, logical data storage, cloud storage, database platform as DBPaaS service, SaaS model analyst, data management environment, IMC technology.

 Download full text! (In Russian).

  1. GRITSENKO V.I., OURSATYEV, A.A., 2017. “Big Data and the Tools for Analytics”, Upr. sist. maš., 4, pp. 3–14. (In Russian).
  2. HINCHCLIFFE, DION. The enterprise opportunity of Big Data: Closing the “clue gap”. [online] Available at: <http://www.zdnet. com/article/the-enterprise-opportunity-of-big-data-closing-the-clue-gap/> [Accessed 18 September 2017].
  3. HP Vertica. [online] Available at: <http://www.vertica.com/> [Accessed 18 September 2017].
  4. IT architect of the data warehouse architect. The choice of Vertica VS. [online] Available at: <http: //ascrus.blogspot. com / 2013/01 / vertica-vs.html> [Accessed 28 January 2013]. (In Russian).
  5. BORCHUK, L., 2016. “Value Optimizers for DBMS: yesterday and today”. Open Systems, 1, pp. 36-39. (In Rus-sian).
  6. HP Vertica Analytics Platform Version 7.0.x Documentation. Flex Zone. [online] Available at: <https://my.vertica.com/docs/ 7.0.x/HTML/index.htm#Authoring/FlexTables/FlexTab-leHandbook.htm%3FTocPath%3DFlex%2520Tables% 2520Guide%7C_0> [Accessed 27 September 2017].
  7. Brust Andrew. Vertica 7 to NoSQL DBs: Drop dead. ZDNet – for Big on Data, Topic: Big Data Analytic. [online] Available at: <http://www.zdnet.com/article/ vertica-7-to-nosql-dbs-drop-dead/> [Accessed 21 Nov. 2013].
  8. ARMBRUST M., XIN R., LIAN C. et al. Spark SQL: Relational Data Processing in Spark. Proc. of the 2015 ACM SIGMOD Int. Conf. on Management of Data, 31 May – 4 June 2015. Melbourne, Victoria, Australia, 2015. [online] Available at: <http://people.csail.mit.edu/matei/papers/ 2015/sigmod_spark_sql.pdf> [Accessed 28 June 2015].
  9. Vertica Blog. Looking Under the Hood at Vertica Queries. [online] Available at: <https://my.vertica.com/blog/ looking-under-the-hood-at-vertica-queriesba-p235038/> [Accessed 02 Mar. 2016].
  10. Spark SQL and DataFrames. Spark 1.5.2 Documentation. [online] Available at: <http://spark.apache.org/docs/latest/sql-program¬ming-guide.html> [Accessed 2 January 2017].
  11. ARMBRUST M., HUAI Y., LIANG C. et al. Deep Dive into Spark SQL’s Catalyst Optimizer. [online] Available at: <https://databricks.com/blog/2015/04/13/deep-dive-into- spark-sqls-catalyst-optimizer.html> [Accessed 15 Apr. 2015].
  12. XIN R., ROSEN J. Project Tungsten: Bringing Apache Spark Closer to Bare Metal. [online] Available at: <https://databricks.com/blog/2015/04/28/project-tungsten- bringing-spark-closer-to-bare-metal.html> [Ac-cessed 28 April 2015].
  13. MARK, A. BEYER, EDJLALI, R. Magic Quadrant for Data Warehouse Database Management Systems. [online] Available at: <https://www.slideshare.net/paramitap/ gartner-magic-quadrant-for-data-warehouse-database-manage¬ment-systems> [Accessed 7 Mar. 2014].
  14. HP Haven OnDemand. [online] Available at: <http://www8.hp.com/ua/ru/ software-solutions/big-data-cloud-haven-ondemand/> [Accessed 8 Dec. 2016].
  15. Platform for large amounts of data. [online] Available at: <http://www8.hp.com/ua/ru/software-solutions/big-data-platform-haven/> [Accessed 8 Dec. 2017]. (In Russian).
  16. Kx. [online] Available at: <https://kx.com> [Accessed 15 Oct. 2017].
  17. Encyclopedia of programming languages. K (programming language). [online] Available at: <http://progopedia.ru/language/k/> [Accessed 28 January 2017]. (In Russian).
  18. GRAVES, STEVE. In-Memory Database Systems. [online] Available at: <http://www.linuxjournal.com/article/6133> [Accessed 1 Sept. 2002].
  19. Gartner. Delivering Scalable and Robust Data Infrastructures with DaaS in Financial Markets. Kx for DaaS, Feb. 2017. [online] Available at: <http://www.gartner.com/imagesrv/ media-products/pdf/Kx/KX-1-3RU8DEE.pdf> [Accessed 12 Oct. 2015].
  20. Gartner. Real-time Insights and Decision Making using Hybrid Streaming, In-Memory Computing Analytics and Transaction Processing. [online] Available at: <https://www.gartner.com/ imagesrv/media-products/pdf/Kx/KX-1-3CZ44RH.pdf> [Accessed 17 June 2016].
  21. PEZZINI, MASSIMO. Predicts 2016: In-Memory Computing-Enabled Hybrid Transaction/Analytical Processing Supports Dramatic Digital Business Innovation. [online] Available at: <https://www.linkedin.com/pulse/predicts-2016-in-memory-computing-enabled-hybrid-supports-pezzini> [Accessed 14 January 2016].
  22. COLMER, P. In Memory Data Grid Technologies Wednesday. [online] Available at: <http://highscalability.com/blog/ 2011/12/21/in-memory-data-grid-technologies.html> [Accessed 21 Dec. 2011].

Received 16.01.2018