Relation to deep learning frameworks: Ray is fully compatible with deep learning frameworks like TensorFlow, PyTorch, and MXNet, and it is natural to use one or more deep learning frameworks along with Ray in many applications (for example, our reinforcement learning libraries use TensorFlow and PyTorch heavily). Such a framework should include the performance benefits of a hand-optimized system without requiring the user to reason about scheduling, data transfers, and machine failures. One of Ray’s goals is to enable practitioners to turn a prototype algorithm that runs on a laptop into a high-performance distributed application that runs efficiently on a cluster (or on a single multi-core machine) with relatively few additional lines of code. The authors’ implementation involves thousands of lines of code and must define communication protocols, message serialization and deserialization strategies, and various data handling strategies. However, running the algorithm efficiently on a larger machine or cluster requires significantly more software engineering. The algorithm is about a dozen lines of pseudocode, and its Python implementation doesn’t take much more than that. This amounts to a lot of redundant effort.Īs an example, take a conceptually simple algorithm like Evolution Strategies for reinforcement learning. While good solutions for specific use cases (e.g., parameter servers or hyperparameter search) and high-quality distributed systems outside of AI do exist (e.g., Hadoop or Spark), practitioners developing algorithms at the frontier often build their own systems infrastructure from scratch. However, the infrastructure for doing machine learning on clusters remains ad-hoc. As machine learning algorithms and techniques have advanced, more and more machine learning applications require multiple machines and must exploit parallelism.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |