Some real important differences to consider when you are choosing R or Python over one another:
https://datascience.stackexchange.com/questions/326/python-vs-r-for-machine-learning
- Machine Learning has 2 phases. Model Building and Prediction phase. Typically, model building is performed as a batch process and predictions are done realtime. The model building process is a compute intensive process while the prediction happens in a jiffy. Therefore, performance of an algorithm in Python or R doesn't really affect the turn-around time of the user. Python 1, R 1.
- Production: The real difference between Python and R comes in being production ready. Python, as such is a full fledged programming language and many organisations use it in their production systems. R is a statistical programming software favoured by many academia and due to the rise in data science and availability of libraries and being open source, the industry has started using R. Many of these organisations have their production systems either in Java, C++, C#, Python etc. So, ideally they would like to have the prediction system in the same language to reduce the latency and maintenance issues. Python 2, R 1.
- Libraries: Both the languages have enormous and reliable libraries. R has over 5000 libraries catering to many domains while Python has some incredible packages like Pandas, NumPy, SciPy, Scikit Learn, Matplotlib. Python 3, R 2.
- Development: Both the language are interpreted languages. Many say that python is easy to learn, it's almost like reading english (to put it on a lighter note) but R requires more initial studying effort. Also, both of them have good IDEs (Spyder etc for Python and RStudio for R). Python 4, R 2.
- Speed: R software initially had problems with large computations (say, like nxn matrix multiplications). But, this issue is addressed with the introduction of R by Revolution Analytics. They have re-written computation intensive operations in C which is blazingly fast. Python being a high level language is relatively slow. Python 4, R 3.
- Visualizations: In data science, we frequently tend to plot data to showcase patterns to users. Therefore, visualisations become an important criteria in choosing a software and R completely kills Python in this regard. Thanks to Hadley Wickham for an incredible ggplot2 package. R wins hands down. Python 4, R 4.
- Dealing with Big Data: One of the constraints of R is it stores the data in system memory (RAM). So, RAM capacity becomes a constraint when you are handling Big Data. Python does well, but I would say, as both R and Python have HDFS connectors, leveraging Hadoop infrastructure would give substantial performance improvement. So, Python 5, R 5.
https://datascience.stackexchange.com/questions/326/python-vs-r-for-machine-learning
No comments:
Post a Comment