Microsoft R Server 9.0.1 For Hadoop 64-bit (English) - Microsoft Imagine

Microsoft R Server 9.0.1 For Hadoop 64-bit (English) - Microsoft Imagine

Manufacturer:

Microsoft Corporation

Platforms:

Windows

Delivery Type:

Download

Available to:

Academic Users

R Server is an enterprise class server for hosting and managing parallel and distributed workloads of R processes on servers (Linux and Windows) and clusters (Hadoop and Apache Spark). It provides an execution engine for solutions built using Microsoft R packages, extending open source R with support for high-performance analytics, statistical analysis, machine learning scenarios, and massively large datasets. Value-added functionality is provided through proprietary packages that install with the server.

You can install R Server on a supported server or cluster, and use an R IDE like R Tools for Visual Studio to adapt or create solutions to use additional capabilities. Although Microsoft R functions are not required in the solutions you deploy, the full value of Microsoft R is realized when you use ScaleR technology and other packages.

R Server is the next generation of the former Revolution R Enterprise server, acquired by Microsoft and distributed commercially for these platforms: Azure, Windows, Linux, Hadoop, Teradata, SQL Server.

Develop and run R models on Hadoop/Apache Spark—Scale your analysis transparently by distributing work across nodes without complex programming

What is new in this release?

R Server for Hadoop

  • Support for Spark 1.6 and 2.0.
  • Support for Spark DataFrames through RxHiveData and RxParquetData in ScaleR when using an RxSpark compute context 
Additional new ScaleR functions for Spark 2.0:
  • Manage Spark persistent sessions: rxSparkConnect, rxSparkDisconnect
  • Manage data in Spark DataFrames : rxSparkListData, rxSparkRemoveData

Why use R Server?

R, along with many other statistical analysis products, is challenged by problems of capacity and speed. Users cannot perform data analysis because their data is too big to fit into memory, or even if it fits, there is not sufficient memory available to perform analysis. In R this is often a problem because copies of data are frequently made during analysis. Even without a capacity limit, computation may be too slow to be useful. R Server with ScaleR not only helps to overcome these challenges in R, but surpasses capabilities in other statistics products.

Data scientists who start with R Client or open source R typically move to R Server when data size or computational scale require additional capacity.

R Server provides the infrastructure for distributing a workload across multiple nodes (referred to as data chunking), running jobs in parallel, and then reassembling the results for further analysis and visualization.

In addition to capacity and scale, R Server offers machine learning and operationalization features, both of which are new in this release.

Benefits of R Server

Reasons for choosing R Server include:

  • Chunked data across multiple disks
  • Increased threads for R worker processes running standard R packages and also ScaleR functions
  • Performance and scalability through parallelization and streaming
  • Supportability and service level agreements for mission-critical workloads
  • Machine learning algorithms and transforms
  • R script running as a standalone web service
  • Toggle between local and remote sessions on the command line
  • Operationalization engine for multi-server topologies with clustered web nodes and compute nodes

Interoperability with R language and across Microsoft R

R Server is built on open source R 3.3.2 and is 100% compatible with the R language. You can run any pure open source R solution on a Microsoft R Open, Microsoft R Client, or Microsoft R Server deployment.

Value-added packages like RevoScaleR, MicrosoftML, and mrsdeploy are available in both Microsoft R Client and Microsoft R Server. Although packages are equally available, the infrastructure backed by each product is substantially different. R Client is limited to in-memory data storage and can use a maximum of two processors.

R Server is the flagship product of the Microsoft R product family and supports much larger workloads. Data scientists typically switch to R Server when data and computational requirements cannot be accommodated on R Client.

Existing solutions developed with R Client can be deployed to R Server with minimal or no changes, but most developers make use of the additional functions, such as parallel and distributed computing that become available when you upgrade to R Server.

In this release, the mrsdeploy package gives you the ability to toggle between remote and local sessions in an R console application. As you change the compute context and make other adjustments to increase data size, you can set up a remote session and issue commands to validate your changes incrementally.

Microsoft R Server 9.0.1 for Hadoop


R Server must be installed on at least one master or client node which will serve as the submit node; it should be installed on as many workers as is practical to maximize the available compute resources. Nodes must have the same version of R Server within the cluster.

Setup checks the operating system and detects the Hadoop cluster, but it doesn't check for specific distributions. Microsoft R Server works with the Hadoop distributions listed here:

  • Hadoop Distributions: Cloudera CDH 5.5-5.8, Hortonworks HDP 2.3-2.5, MapR 5.0-5.2
  • Operating Systems: RHEL 6.x and 7.x, SUSE SLES11, Ubuntu 14.04 (excluding Cloudera Parcel install on Ubuntu)

           Note: All supported operating systems are 64-bit only

  • Spark versions: 1.6 and 2.0. Not all supported versions of Hadoop include a supported level of Spark. Specifically, HDP must be at least 2.3.4 to get a supported level of Spark.

Microsoft R Server requires Hadoop MapReduce, the Hadoop Distributed File System (HDFS), and Apache YARN. Optionally, Spark version 1.6-2.0 is supported for Microsoft R Server 9.0.1.:

Processor: 64-bit CPU with x86-compatible architecture (variously known as AMD64, Intel64, x86-64, IA-32e, EM64T, or x64 CPUs). Itanium-architecture CPUs (also known as IA-64) are not supported. Multiple-core CPUs are recommended.

Memory: A minimum of 8 GB of RAM is required for Microsoft R Server; 16 GB or more are recommended. Hadoop itself has substantial memory requirements; see your Hadoop distribution’s documentation for specific recommendations.

Disk Space: A minimum of 500 MB of disk space is required on each node for R Server. Hadoop itself has substantial disk space requirements; see your Hadoop distribution’s documentation for specific recommendations.

Loading... Loading...