R Server is an enterprise class server for hosting and managing parallel and distributed workloads of R processes on servers (Linux and Windows) and clusters (Hadoop and Apache Spark). It provides an execution engine for solutions built using Microsoft R packages, extending open source R with support for high-performance analytics, statistical analysis, machine learning scenarios, and massively large datasets. Value-added functionality is provided through proprietary packages that install with the server.
You can install R Server on a supported server or cluster, and use an R IDE like R Tools for Visual Studio to adapt or create solutions to use additional capabilities. Although Microsoft R functions are not required in the solutions you deploy, the full value of Microsoft R is realized when you use ScaleR technology and other packages.
R Server is the next generation of the former Revolution R Enterprise server, acquired by Microsoft and distributed commercially for these platforms: Azure, Windows, Linux, Hadoop, Teradata, SQL Server.
Develop and run R models on Hadoop/Apache Spark—Scale your analysis transparently by distributing work across nodes without complex programming
What is new in this release?
R Server for Hadoop
Support for Spark 1.6 and 2.0.
Support for Spark DataFrames through RxHiveData and RxParquetData in ScaleR when using an RxSpark compute context
Additional new ScaleR functions for Spark 2.0:
Manage Spark persistent sessions: rxSparkConnect, rxSparkDisconnect
Manage data in Spark DataFrames : rxSparkListData, rxSparkRemoveData
Why use R Server?
R, along with many other statistical analysis products, is challenged by problems of capacity and speed. Users cannot perform data analysis because their data is too big to fit into memory, or even if it fits, there is not sufficient memory available to perform analysis. In R this is often a problem because copies of data are frequently made during analysis. Even without a capacity limit, computation may be too slow to be useful. R Server with ScaleR not only helps to overcome these challenges in R, but surpasses capabilities in other statistics products.
Data scientists who start with R Client or open source R typically move to R Server when data size or computational scale require additional capacity.
R Server provides the infrastructure for distributing a workload across multiple nodes (referred to as data chunking), running jobs in parallel, and then reassembling the results for further analysis and visualization.
In addition to capacity and scale, R Server offers machine learning and operationalization features, both of which are new in this release.
Benefits of R Server
Reasons for choosing R Server include:
Chunked data across multiple disks
Increased threads for R worker processes running standard R packages and also ScaleR functions
Performance and scalability through parallelization and streaming
Supportability and service level agreements for mission-critical workloads
Machine learning algorithms and transforms
R script running as a standalone web service
Toggle between local and remote sessions on the command line
Operationalization engine for multi-server topologies with clustered web nodes and compute nodes
Interoperability with R language and across Microsoft R
R Server is built on open source R 3.3.2 and is 100% compatible with the R language. You can run any pure open source R solution on a Microsoft R Open, Microsoft R Client, or Microsoft R Server deployment.
Value-added packages like RevoScaleR, MicrosoftML, and mrsdeploy are available in both Microsoft R Client and Microsoft R Server. Although packages are equally available, the infrastructure backed by each product is substantially different. R Client is limited to in-memory data storage and can use a maximum of two processors.
R Server is the flagship product of the Microsoft R product family and supports much larger workloads. Data scientists typically switch to R Server when data and computational requirements cannot be accommodated on R Client.
Existing solutions developed with R Client can be deployed to R Server with minimal or no changes, but most developers make use of the additional functions, such as parallel and distributed computing that become available when you upgrade to R Server.
In this release, the mrsdeploy package gives you the ability to toggle between remote and local sessions in an R console application. As you change the compute context and make other adjustments to increase data size, you can set up a remote session and issue commands to validate your changes incrementally.