Control Systems and Computers, N1, 2019, Article 6

https://doi.org/10.15407/usim.2019.01.052

Upr. sist. maš., 2019, Issue 1 (279), pp. 52-67.

UDC 004.65:004.7:004.75:004.738.5

O.A. Oursatyev, PhD (Eng.), Leading Research Associate, International Research and Training Centre of Information Technologies and Systems of the NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03187, Ukraine, aleksei@irtc.org.ua

BIG DATA. ANALYTICAL DATABASES AND DATA WAREHOUSE: NETEZZA

Introduction. The article is a continuation of the Big Data and tools study, which is being transformed into technology of the new generation and architecture of the BD platforms and storage for the intelligent output. In this part the review of DB Netezza is presented. The main attention is paid to the issues of changing the infrastructure, the tool environment and the platform for identifying the necessary information and new knowledge from the Big Data, the initial information about the product is given in the product general description.

Purpose. The purpose is to consider and evaluate the application effectiveness of the infrastructure solutions for new developments in the Big Data study, to identify new knowledge, the implicit  connections and in-depth understanding, insight into phenomena and processes.

Methods. The informational and analytical methods and technologies for data processing, the methods for data assessment and forecasting, taking into account the development of the most important areas of the informatics and information technology.

Results. Netezza, like Teradata, is configured and prepared for the quick use. Hardware and software (Appliances), combining data storage and processing in a single system, is originally designed and optimized for analytics. The non-sharing resources MPP environment and the top SMP server of the upper level of the asymmetric AMPP-architecture are used. The SMP-server, in addition to the coordinating work, provides an increase in performance while the number of client sessions growth. A significant part of data processing is performed practically at the level of the SPU nodes disk controllers, the intellectual snippets. Data is loaded using both the regular utilities and ETL tools. Netezza wants to change the existing position and turn to a real parallel download.

A key feature of Netezza is the productivity multiplier of the analytic complex, which provides the significant hardware acceleration for executing SQL queries. Programmable logic matrices PLA on SPU nodes provide the streaming data processing when accessing disks. As a result, the memory and SPU processors work with already-filtered data, significantly speeding up further processing. FAST EngineTM Framework’s usable streaming technology is achieved by programming the decompression and filtering functions, syntax checking, transaction visibility, etc. The set of FAST Engines streaming mechanisms allows to create new functions for emerging problems.

The particular attention is paid to ease the use and minimize the requirements in the settings. Netezza has almost nothing to administer. For example, data compression is performed automatically and adapts to the data types, without requiring the user to specify the necessary algorithms. There are no configuration and designing of the database, no data model requirements. There are no indexes and tuning, including for performing ad-hoc requests – performance is as it is (out of the box). Load management provides functionality for managing resources and prioritizing query execution in a multi-user environment and under mixed load conditions. To expand the range of tasks and develop one’s own analytical processes, it is possible to use C / C ++, Java, Python, Fortran, R and support for an expandable, an open-source integrated development environment (plug-in for Eclipse).

Netezza is primarily an analytical complex with highly developed analytics tools, such as Data Mining, OLAP, Hadoop, and others, but, according to Monash Research, it has one of the lowest cost-per-terabyte user data in the industry.

The main competitors of Netezza are Teradata, Vertica, IBM, Greenplum, etc.

Conclusion. Netezza integrates with existing IBM products that add IBM DBPaaS cloud storage capabilities, including the use of various platforms in local and hybrid clouds, as well as support for analytics in the Apache Spark database of stand-alone DBMS on Db2 platforms and much more. At the same time, it seems that Netezza remains a consistent supporter of the reprogrammable information processing tools and successful software improvements embedded in the SPU, since this is where it received a significant performance boost. In this regard, the question is asked whether Netezza will not go on the way of integrating into the SPU a production Apache Spark software in RAM with standard libraries for analyzing big data in support of the existing frameworks on the PLM. At the very least, this seems to be more interesting within the framework of Appliances than to build efficient hybrid systems that can both process a lot of trances and simultaneously scan large amounts of information in search queries for analytical requests.

 Download full text! (In Russian)

Keywords: data warehouse appliance platform, AMPP™ – asymmetric massively parallel processing, SN (shared nothing) MPP architecture, SPU (snippet processing unit) – modules for processing code fragments, FPGA – programmable logic arrays – Intelligent Query Streaming® Netezza® component, IBM Netezza, SQL analytics on Hadoop, support for analytics on Apache Spark on IBM Db2 platforms.

  1. Netezza, https://en.wikipedia.org/wiki/Netezza
  2. IBM Analytics. IBM Data Warehouse Systems (formerly Netezza Appliances), https://www.ibm.com/analytics/netezza
  3. Aleksandrov, A., 2006. Mashiny khranilishch dannykh. Open Systems, 2, https://www.osp.ru/os/2006/02/1156529.
  4. Aleksandrov, A., 2007. Apparatno-programmnyye khranilishcha, OS, 5, https://www.osp.ru/os/2007/05/4260303.
  5. Dinsmore, Th.W., 2016. Disruptive Analytics: Charting Your Strategy for Next-Generation Business Analytics. Apress, p. 262., https://www.apress.com/us/book/9781484213124
    https://doi.org/10.1007/978-1-4842-1311-7
  6. NonStop SQL. Bauman National Libraru, https://ru.bmstu.wiki/NonStop_SQL
  7. Otkazoustoychivyye servery kompanii. Tandem Computer Inc., http://doc.sumy.ua/db/skbd/glava_17.htm
  8. Netezza Performance Server (NPSTM) 8000. Wayback Machine, https://web.archive.org/web/20040407102937/http://www.netezza.com:80/products/prod_downloads/Product%20Brochure.pdf
  9. Foster D. Hinshaw. AMPP: combining SMP and MPP to speed database queries, https://www.acronymfinder.com/Asymmetric-Massively-Parallel-Processing-(Netezza-Performance-Server)-(AMPP).html
  10. The Netezza FAST Engines™ Framework. A Powerful Framework for High-Performance Analytics, 2008, http://www.monash.com/uploads/netezza-fpga.pdf
  11. Swoyer Stephen. Netezza Says Netezza Performance Server R4 Doubles Query Performance,https://tdwi.org/articles/2007/09/05/netezza-says-nps-r4-doubles-query-performance.aspx
  12. Monash Curt. Netezza is changing its hardware architecture and slashing prices accordingly, July 30, 2009, http://www.dbms2.com/2009/07/30/netezza-new-product-family/
  13. Netezza launches new data warehouse appliance family, July 31 2009, https://www.zdnet.com/article/netezza-launches-new-data-warehouse-appliance-family/
  14. Netezza’s TwinFin fuels profit surge, Aug. 27, 2010, https://www.zdnet.com/article/netezzas-twinfin-fuels-profit-surge/
  15. Prickett-Morgan Timothy, 2010. Netezza to bake analytics into appliances, Feb. 24,  https://www.theregister.co.uk/2010/02/24/netezza_data_analytics/
  16. IBM. Frantsisko Fil. Arkhitektura Netezza Data Appliance: platforma vysokoproizvoditel’nykh khranilishch dannykh i analitiki, http://www.redbooks.ibm.com/redbooks/pdfs/redp4725-00-ru.pdf
  17. IBM PureData System, http://www.ndm.net/datawarehouse/IBM/ibm-puredata-system
  18. Timchur A. Unikal’nyy programmno-apparatnyy kompleks IBM Netezza dlya analiticheskikh khranilishch dannykh. Forum IBM 2012, https://www.ibm.com/ru/events/presentations/ astana2012/at2.pdf.
  19. Volkov, D. Netezza Beep Dive. Dsvolk Oracle News: 01.07.11 – 01.08. 11, http://dsvolk.blogspot.com/ 2011/07/
  20. IBM Db2 Warehouse overview. IBM® IBM Knowledge Center, https://www.ibm.com/support/knowledgecenter/en/SS6NHC/com.ibm.swg.im.dashdb.doc/local_overview.html.

Received 14.05.2018