Control Systems and Computers, N3, 2017, Article 1
DOI: https://doi.org/10.15407/usim.2017.03.006
Upr. sist. maš., 2017, Issue 3 (269), pp. 6-19.
UDC 004.9:004.75:004.451.82:004.738.52: 004.823
A.P. Lozinskiy1, V.M. Simakhin2, A.A. Oursatyev3
Technologies Modeling for Processing Large Data on the Local Cloud Platform
1 Junior Research Associate, International Research and Training Centre of Information Technologies and Systems of the NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03186, Ukraine, loza@irtc.org.ua
2 Engineer, International Research and Training Centre of Information Technologies and Systems of the NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03186, Ukraine, sima@irtc.org.ua
3 PhD in Techn. Sciences, Leading Research Associate, International Research and Training Centre of Information Technologies and Systems of the NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03186, Ukraine, aleksei@irtc.org.ua
Introduction. The implementation of the operational local cloud platform model is considered which provide two services at the SaaS level. The first one relates to the issue of optimizing the organization of workplaces in the organization. The second one implements the model of the large data processing environment. The main issue is to solve the problem of combining heterogeneous tasks within a common environment and redistributing computing resources between them.
Purpose. The purpose of this article is to build a model of a multi-purpose local cloud platform with a flexible redistribution of power between workloads and to consider the usages of two applications for the model: the service of terminal access to desktops and the Hadoop software platform for big data analysis. In addition, the modeling of the search engine element – the search robot – as the task of big data analysis.
Methods. Methods of modeling and abstraction were used.
Results. A functioning model of the local cloud platform, which provides a flexible mechanism for modeling and deploying platforms of a wide range of architectures, purpose and production, is proposed. The possibilities of using existing solutions of search robots are analyzed and our own experimental development is created. Both variants of search robots provide the necessary information in a suitable form for the work of consequential elements of search engines.
Conclusion. The model is proposed as a general solution for the deployment of private local clouds in enterprises. One of the possible applications implemented based on a platform for big data analytics is the creation of a search engine.
Keywords: cloud platform, processing large data, big data analytics, SaaS level.
- Gritsenko, V.I., Oursatyev, A.A., Lozinskiy, A.P., 2015. “Cloud Technologies Multipurpose Complexes of Georegional Systems”, Upravlausie sistemy i masiny, 2, pp. 4–17.
- Gritsenko, V.I., Oursatyev, A.A., 2013. “Cloud Computing and Cloud Model of IT Service Provision”, Kibernetika i vycislitelnaa tehnika, 171, pp. 5–19.
- ISO/IEC 17788:2014 Information technology – Cloud computing – Overview and vocabulary – impl. 15.10. 2014, Brussels: European Committee for Electrotechnical Standardization, 2014, 16 p.
- Badger, L., Grance, T., Patt-Corner R. et. al., Cloud Computing Synopsis and Recommendations. Recommendations of the National Institute of Standards and Technology. NIST Special Publication 800–146, URL: http://csrc.nist.gov/publications/nistpubs/800-146/sp800-146.pdf.
- AWS Amazon, URL: https://aws.amazon.com/ru/.
- Microsoft Azure, URL: https://azure.microsoft.com/ ru-ru/.
- Google Cloud Platform, URL: http://cloud.google. com/?hl=ru.
- Openstack open source cloud computing software, URL: https://www.openstack.org/.
- Lozinskiy, A.P., 2014. “A glance of the functional possibilities of the software zabezpechennnia hmarnoї platform OpenstackIcehouse”, Nauk. scraps, 122, pp. 84–93, URL: http://www.irbis-nbuv.gov.ua/cgi-bin/ irbis_nbuv/cgiirbis_64.exe?C21COM = 2&I21DBN = UJRN&P21DBN = UJRN&IMAGE_FILE_DOWNLOAD=1&Image_file_name= PDF/Nzped_2014_122_13.pdf.
- ISO/IEC 18384-1:2016(E), Information technology – Reference Architecture for Service Oriented Architecture (SOA RA), URL: https://webstore.iec.ch/preview/ info_isoiec18384-1%7Bed1.0%7Den.pdf.
- What is Open Stack?, URL: http://www.openstack. org/software/.
- Format of the disk image of the program QEMU, https://ru.wikipedia.org/wiki/Qcow2.
- Linux CentOS images download, URL: http://cloud. centos. org/centos/7/images/.
- Open Stack Docs, URL: https://docs.openstack.org/.
- Heat Orchestration Template (HOT) Guide, URL: http://docs.openstack.org/ developer/heat/template_guide/ hot_guide.html.
- Cloudera Enterprise Solution, URL: http://www. cloudera.com/.
- What is Apache Hadoop? URL: http://hortonworks. com/hadoop/.
- Hadoop&BigData, URL: https://www.mapr.com/ products/apache-hadoop
- Oursatyev A.A., 2016. “Some software environments for large data analytics”, Upravlausie sistemy i masiny, 3, pp. 29–42.
- Oursatyev A.A., 2016. “Some software environments for large data analytics and machine learning”, Upravlausie sistemy i masiny, 5, pp. 63–75.
- Cloudera Enterprise Download, URL: http://www. cloudera.com/downloads.html
- Installing Cloudera Manager and CDH, URL: http:// www.cloudera.com/documentation/enterprise/latest/ topics/ installation.html
- Hadoop, Ch. 1: deployment of a cluster, URL: https://habrahabr.ru/company/selectel/blog/198534/
- CDH 5 Packaging and Tarball Information, URL: https://www.cloudera.com/documentation/enterprise/ release-notes/topics/cdh_vd_cdh_package_tarball.html
- Apache Hadoop 2.7.2 – MapReduce Tutorial, URL: https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduTutorial.html#Example:_WordCount_v.2.0
- Machine Learning Library (MLlib) Programming Guide – Spark 1.2.0. Documentation, https://spark. apache.org/docs/1.2.0/mllib–guide.html
- GlybovetsA.N., Dmitruk Ya.O., 2016. “The effectiveness of using programming languages in the Apache Hadoop framework using MapReduce”, Upravlausie sistemy i masiny, 5, pp. 84–92.
- Tarakeswar, K., Kavitha, D., 2011. Search Engines: A Study. J. of Comp. Appl. (JCA), ISSN: 0974-1925, IV, 1, URL: http://citeseerx.ist.psu.edu/viewdoc/ download? doi=10.1.1.300.4896&rep=rep1&type=pdf
- Apache NutchTM URL: https://nutch.apache.org
- Apache GoraTM URL: https://gora.apache.org/
- Front Page – Nutch Wiki, URL: https://wiki.apache. org/nutch/FrontPage#What_is_Apache_Nutch.3F
- Nutch Tutorial – Nutch Wiki, URL: https://wiki. apache.org/nutch/NutchTutorial
- Nutch Command Line Options of bin/nutch – Nutch Wiki, URL: https://wiki.apache.org/nutch/Command LineOptions
- Laboratorio de Investigación Aplicada – Report by Apache Nutch, URL: http://nitec.wikidot.com/
- NutchFileFormats – Nutch Wiki, https://wiki. apache. org/nutch/NutchFileFormats
- Dubova N. Innovative Accelerators: The “Big Seven”, Open Systems, 2016, n 4, https://www.osp. ru/os/2016/04/13050983.
- Scrapy A Fast and Powerful Scraping and Web Crawling Framework, https://scrapy.org
- Github – yasserg/crawler4j, https://github.com/ yasserg/crawler4j
- Github – scrapinghub/frontera, https://github.com/ scrapinghub/frontera
- Brin, S., Page, L., 1998. “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, Comp. Networks and ISDN Syst., 1, 1998, 30 (1–7), pp. 107–117, http://dx.doi.org/ 10.1016/S0169-7552(98)00110-X.
- Croft, W.B., Metzler, D., Strohman, T., 2015. Search Engines Information Retrieval in Practice, 518 p.
- Glybovets, A.M., Shabinsky, AS, Olshevsky, R.Ya. Construction of the search robot of Ukrainian-language scientific materials, Sciences. work, 130, T. 143, http:// lib.chdu.edu.ua/pdf/naukpraci/computer/2010/143-130-13.pdf.
- Kolyada, A.S., Gogunsky, V.D., 2013. “Automation of information retrieval from scientometric databases”, Management of rozvitkom folding systems, 16, pp. 96–99, http://journals.uran.ua/urss/artocle/view/38927/35236
- Github – kohlschutter/boilerpipe, https://github.com/ kohlschutter/boilerpipe
- Kohlschütter, C., Fankhauser, P., Nejdl, W. Boilerplate Detection using Shallow Text Features, http://www.l3s.de/ ~kohlschuetter/publications/wsdm187-kohlschuetter.pdf
- Boilerpipe Web API, https://boilerpipe-web.appspot.com
- jsoup: Java HTML Parser, https://jsoup.org
- OpenRefine, http://openrefine.org
- Cucumber Simple, human collaboration, https://cucumber.io.
Received 19.05.2017