Summary
Applications running on Cardano, as well as SPOs, need computing power in the form of CPUs or GPUs. Currently there are only options to have cloud computing rented from big tech, which increases the reliance on such big tech companies or requires purchasing costly hardware setups. In the increasingly hostile and censorship prone environment, it is essential to secure the reliability and decentralization of Cardano.
Computing needs in the Cardano ecosystem can broadly be divided into:
1. CPU requirements - Stake Pool Operators
2. GPU requirements - Artificial Intelligence (Machine Learning), Dapps, Metaverse, others.
Allowing decentralized computing on CPUs is a prerequisite for running Cardano Nodes via NuNet, a project which already was awarded funding from Cardano Catalyst Fund7 as one of the top 20 voted proposals.
Fund8 proposal will push forward, expand the scope and focus on the GPU aspect.
Source:
<https://cardano.ideascale.com/c/idea/383862>
<https://medium.com/nunet/decentralized-compute-for-spos-is-coming-aecdcbbc3fa7>
Overview
Utilization of GPU by the NuNet platform will span in two phases:
Foundation - Phase 1: One User Per GPU
Scaling - Phase 2: GPU Grid Computing
Phase 1: Foundation - One User Per GPU Model
This model will involve getting the NuNet containers to support GPU access, monitor resource usage of GPUs and make them directly available to the processes running inside the containers. The GPUs utilized in this model initially will be the GPUs available on that specific provider device.
This model has its use-cases and would be able to allow ML model training and inference if the available GPU is adequately capable to handle the workload by itself. Additionally, it would serve as a guidance for the next phases of development by allowing the core development to be performed which involves supporting GPU device onboarding to NuNet, enabling NuNet Adapter to manage GPUs, implementation of GPU access from within virtual machines and containers, and monitoring GPU resource usage for provider compensation.
Regular personal computers are known not to have enough GPU capacity for large workloads and thus this model will be limited in its ability to allow large-scale ML projects and especially federated learning where data should not be transmitted to the device where the GPU is located. A model where data storage and device with GPU for training are decoupled is necessary to allow users to not upload data to a Provider's device in order to perform the training. It should be possible to allow only certain tasks and processes that need GPU execution be relayed to Provider's devices without having to transmit full training data i.e. process being transmitted instead of code and data.
Phase 1 is proposal and scope for Cardano Catalyst Fund8 (present proposal).
Source:
<https://arxiv.org/pdf/2103.08894.pdf>
Phase 2: Scaling - GPU Grid Computing
This model involves accumulating massive amounts of processing power by virtualizing GPUs and aggregating them in a pool where end users of these GPUs have access to a cluster instead of a single device.
Technically, this will be implemented in two interconnected steps:
Phase 2A: Splitting jobs into manageable tasks
Phase 2B: Assigning a cluster of virtual GPUs to workloads
Phase 2A: Splitting Jobs
This method involves three main components:
- Worker : This component performs that actual work. This is basically a single procedure that is executed on a GPU
- Work Manager : This component performs task splitting. It accepts large jobs, splits them into individually processable tasks and dispatches them to Workers across Provider devices.
- Job Dispatcher : This component submits the full job to be executed to the Work Manager
In order to successfully develop this method, it would require interfering with the initial programming of the ML tasks. That is, it is necessary to ensure the ability of the Work Manager to split jobs into individually executable tasks. This can be achieved for example by building a library with a high level API to Numpy where certain operations are overloaded to be splittable. It helps developers write just like they're used to but would have to use certain recommended functions and data structures.
Phase 2B: Cluster of Virtual GPUs
This method will virtualize all GPUs available on the NuNet platform and make them available to containers running ML tasks as physical GPUs located on that virtual machine. This method will not involve building a task splitter as the splitting, scheduling and prioritization of tasks will be done by the low level APIs themselves.
It is based on the following use-case worked out with DeepChainADA: <https://github.com/nunet-io/simple-ML-on-GPU/issues/1>
The description of Phase 2 is given here in order to understand the long term potential and plan for building fundamentals (Phase 1). The current proposal does not include Phase 2 scope, which will be submitted for further Catalyst Funds based on the success of Phase 1.
GPU requirements - Artificial Intelligence (Machine Learning)
Training a Machine learning (ML) model requires a lot of processing power which can be costly or difficult to obtain. In Cardano Catalyst Fund7, an interesting proposal was funded which enables Decentralized Federated Machine Learning by ensuring privacy to allow open collaboration. This proposal will need GPU power to train the ML models, and is just one example of the potential usage of decentralized GPU power provided by NuNet. Furthermore, inferencing those models is less computationally expensive, but still needs considerable GPU compute resources and is somewhat more prone to decentralization.
DeepchainAda: Trustless AI training
Source:
<https://app.ideascale.com/t/UM5UZBqdc>
What is Machine Learning?
Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.
Source:
https://en.wikipedia.org/wiki/Machine_learning
Why GPUs for Machine Learning?
GPUs are optimized for training artificial intelligence and deep learning models as they can process multiple computations simultaneously.
They have a large number of cores, which allows for better computation of multiple parallel processes. Additionally, computations in deep learning need to handle huge amounts of data — this makes a GPU’s memory bandwidth most suitable.
Source:
<https://towardsdatascience.com/what-is-a-gpu-and-do-you-need-one-in-deep-learning-718b9597aa0d>
Inside the cryptocurrency industry there are a lot of hardware providers of CPU and GPU power which can be easily diverted to train ML models. NuNet's proposal will enable tapping into that huge potential market (e.g. ETH miners) and linking to the demand in the Cardano ecosystem.
NuNet, a spinoff of SingularityNet, allows to run arbitrary computing workflows on community provisioned hardware and provides payment gateways directly from software or application via Cardano Plutus Smart Contracts. Adding the functionality to source decentralized GPU computing resources via NuNet ecosystem will tap into a huge and expanding part of global computing infrastructure, powering growing industries of AI as well as the emerging industry of Metaverse. NuNet’s ability to connect decentralized hardware into a single workflow is an attractive possibility for these industries.
This would greatly increase the possibilities of the growing ecosystem on Cardano as already witnessed by the needs of DeepchainAda: Trustless AI training. NuNet can provide resilience and true decentralization through the Cardano network both in CPU and GPU computing domains.
The proposal addresses the Challenge goals in terms of:
- Deployment, testing, and monitoring frameworks
- Support structures
- Incentivization structures
To summarize, this proposal brings value to Cardano by enabling flexible, decentralized, robust, faster or cheaper CPU and GPU resources as a computing framework to support the Cardano ecosystem.
Risk 1: Mostly general technical research and development uncertainties and complexity of the project from that side. We are fairly confident that the team will be able to deal with difficulties, but that may require additional time and work.
Risk 2: Complexities with deployment with the pilot partner. To be mitigated with the possibility of including more testing partners inside the NuNet open source community.
Risk 3: Increased hardware prices and uncertainty in the GPU device market. To be mitigated by focused monitoring of price swings and acquiring hardware when prices are lowest.