NuNet: Decentralized GPU ML Cloud

Funds Fund 8 Proposals F8: Developer Ecosystem NuNet: Decentralized GPU ML Cloud

completed

View on Ideascale

Current Project Status

Complete

Amount
Received

$83,000

Amount
Requested

$83,000

Percentage
Received

100.00%

Solution

NuNet platform that connects decentralized GPU hardware providers and enables secure, safe and decentralized access to GPUs for Cardano.

Problem

Artificial Intelligence (Machine Learning) models need GPU processing power. How to provide such decentralized GPU power to grow Cardano.

Addresses Challenge

Feasibility

Auditability

NuNet: Decentralized GPU ML Cloud

Impact

Summary

Applications running on Cardano, as well as SPOs, need computing power in the form of CPUs or GPUs. Currently there are only options to have cloud computing rented from big tech, which increases the reliance on such big tech companies or requires purchasing costly hardware setups. In the increasingly hostile and censorship prone environment, it is essential to secure the reliability and decentralization of Cardano.

Computing needs in the Cardano ecosystem can broadly be divided into:

1. CPU requirements - Stake Pool Operators

2. GPU requirements - Artificial Intelligence (Machine Learning), Dapps, Metaverse, others.

Allowing decentralized computing on CPUs is a prerequisite for running Cardano Nodes via NuNet, a project which already was awarded funding from Cardano Catalyst Fund7 as one of the top 20 voted proposals.

Fund8 proposal will push forward, expand the scope and focus on the GPU aspect.

Source:

<https://cardano.ideascale.com/c/idea/383862>

<https://medium.com/nunet/decentralized-compute-for-spos-is-coming-aecdcbbc3fa7>

Overview

Utilization of GPU by the NuNet platform will span in two phases:

Foundation - Phase 1: One User Per GPU

Scaling - Phase 2: GPU Grid Computing

Phase 1: Foundation - One User Per GPU Model

This model will involve getting the NuNet containers to support GPU access, monitor resource usage of GPUs and make them directly available to the processes running inside the containers. The GPUs utilized in this model initially will be the GPUs available on that specific provider device.

This model has its use-cases and would be able to allow ML model training and inference if the available GPU is adequately capable to handle the workload by itself. Additionally, it would serve as a guidance for the next phases of development by allowing the core development to be performed which involves supporting GPU device onboarding to NuNet, enabling NuNet Adapter to manage GPUs, implementation of GPU access from within virtual machines and containers, and monitoring GPU resource usage for provider compensation.

Regular personal computers are known not to have enough GPU capacity for large workloads and thus this model will be limited in its ability to allow large-scale ML projects and especially federated learning where data should not be transmitted to the device where the GPU is located. A model where data storage and device with GPU for training are decoupled is necessary to allow users to not upload data to a Provider's device in order to perform the training. It should be possible to allow only certain tasks and processes that need GPU execution be relayed to Provider's devices without having to transmit full training data i.e. process being transmitted instead of code and data.

Phase 1 is proposal and scope for Cardano Catalyst Fund8 (present proposal).

Source:

<https://arxiv.org/pdf/2103.08894.pdf>

Phase 2: Scaling - GPU Grid Computing

This model involves accumulating massive amounts of processing power by virtualizing GPUs and aggregating them in a pool where end users of these GPUs have access to a cluster instead of a single device.

Technically, this will be implemented in two interconnected steps:

Phase 2A: Splitting jobs into manageable tasks

Phase 2B: Assigning a cluster of virtual GPUs to workloads

Phase 2A: Splitting Jobs

This method involves three main components:

Worker : This component performs that actual work. This is basically a single procedure that is executed on a GPU
Work Manager : This component performs task splitting. It accepts large jobs, splits them into individually processable tasks and dispatches them to Workers across Provider devices.
Job Dispatcher : This component submits the full job to be executed to the Work Manager

In order to successfully develop this method, it would require interfering with the initial programming of the ML tasks. That is, it is necessary to ensure the ability of the Work Manager to split jobs into individually executable tasks. This can be achieved for example by building a library with a high level API to Numpy where certain operations are overloaded to be splittable. It helps developers write just like they're used to but would have to use certain recommended functions and data structures.

Phase 2B: Cluster of Virtual GPUs

This method will virtualize all GPUs available on the NuNet platform and make them available to containers running ML tasks as physical GPUs located on that virtual machine. This method will not involve building a task splitter as the splitting, scheduling and prioritization of tasks will be done by the low level APIs themselves.

It is based on the following use-case worked out with DeepChainADA: <https://github.com/nunet-io/simple-ML-on-GPU/issues/1>

The description of Phase 2 is given here in order to understand the long term potential and plan for building fundamentals (Phase 1). The current proposal does not include Phase 2 scope, which will be submitted for further Catalyst Funds based on the success of Phase 1.

GPU requirements - Artificial Intelligence (Machine Learning)

Training a Machine learning (ML) model requires a lot of processing power which can be costly or difficult to obtain. In Cardano Catalyst Fund7, an interesting proposal was funded which enables Decentralized Federated Machine Learning by ensuring privacy to allow open collaboration. This proposal will need GPU power to train the ML models, and is just one example of the potential usage of decentralized GPU power provided by NuNet. Furthermore, inferencing those models is less computationally expensive, but still needs considerable GPU compute resources and is somewhat more prone to decentralization.

DeepchainAda: Trustless AI training

Source:

<https://app.ideascale.com/t/UM5UZBqdc>

What is Machine Learning?

Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Source:

https://en.wikipedia.org/wiki/Machine_learning

Why GPUs for Machine Learning?

GPUs are optimized for training artificial intelligence and deep learning models as they can process multiple computations simultaneously.

They have a large number of cores, which allows for better computation of multiple parallel processes. Additionally, computations in deep learning need to handle huge amounts of data — this makes a GPU’s memory bandwidth most suitable.

Source:

<https://towardsdatascience.com/what-is-a-gpu-and-do-you-need-one-in-deep-learning-718b9597aa0d>

Inside the cryptocurrency industry there are a lot of hardware providers of CPU and GPU power which can be easily diverted to train ML models. NuNet's proposal will enable tapping into that huge potential market (e.g. ETH miners) and linking to the demand in the Cardano ecosystem.

NuNet, a spinoff of SingularityNet, allows to run arbitrary computing workflows on community provisioned hardware and provides payment gateways directly from software or application via Cardano Plutus Smart Contracts. Adding the functionality to source decentralized GPU computing resources via NuNet ecosystem will tap into a huge and expanding part of global computing infrastructure, powering growing industries of AI as well as the emerging industry of Metaverse. NuNet’s ability to connect decentralized hardware into a single workflow is an attractive possibility for these industries.

This would greatly increase the possibilities of the growing ecosystem on Cardano as already witnessed by the needs of DeepchainAda: Trustless AI training. NuNet can provide resilience and true decentralization through the Cardano network both in CPU and GPU computing domains.

The proposal addresses the Challenge goals in terms of:

Deployment, testing, and monitoring frameworks
Support structures
Incentivization structures

To summarize, this proposal brings value to Cardano by enabling flexible, decentralized, robust, faster or cheaper CPU and GPU resources as a computing framework to support the Cardano ecosystem.

Risk 1: Mostly general technical research and development uncertainties and complexity of the project from that side. We are fairly confident that the team will be able to deal with difficulties, but that may require additional time and work.

Risk 2: Complexities with deployment with the pilot partner. To be mitigated with the possibility of including more testing partners inside the NuNet open source community.

Risk 3: Increased hardware prices and uncertainty in the GPU device market. To be mitigated by focused monitoring of price swings and acquiring hardware when prices are lowest.

Feasibility

The delivery timeline can be split as follows upon receipt of the funding:

In two months:
NuNet onboarding GPU devices and allowing users to set amount of resource to be used
NuNet adapter equipped with ability to access multi-vendor GPUs
In four months:
NuNet containers supporting GPU access;
Framework to monitor resource usage of GPUs and make them directly available to the processes running inside the containers;
Webapp API specification and implementation for machine learning dApp access to the workflows;
In six months:
Onboarding ML workloads on NuNet for alpha testing GPUs available on that specific provider device.
Plutus contracts and adaptation of Tokenomics API for compensating GPU resource owners via NuNet platform;

Machine Learning webapp implemented, tested and deployed for accessing GPU resources via NuNet platform

The budget includes a mix of personnel, hardware as well as partners defining and running the ML scripts for which GPU computing is needed.

Item Expense Months/Unit Total, USD

Systems engineer 6000 6 36,000

Blockchain development (Plutus) 7000 3 21,000

Fullstack development 3000 4 12,000

Testing hardware 2000 3 6,000

Testing and pilot costs 8000 1 8,000

Total 83,000

The proposed budget is deemed sufficient for the implementation. In case of additional costs or scope, NuNet commits to allocate additional resources from its full-time development team in order to deliver project results as described.

Team lead:

Dr. V. Kabir Veitas - AI researcher & software architect; co-founder & CEO, NuNet.io

<https://www.linkedin.com/in/vveitas>

Project Manager:

Nara Bagiyan

<https://www.linkedin.com/in/narina8>

Technical manager:

Dagim Sisay - NuNet tech lead

<https://www.linkedin.com/in/dagim-sisay-7b4b05b8>

Main developers:

Israel Abebe Azime - MSc in Machine Learning

<https://www.linkedin.com/in/israel-abebe>

Tewodros Kederalah - BSc in Electrical and computer engineering

<https://www.linkedin.com/in/tewodroskederalah>

Khaled Yasser - BSc in Information technology

<https://www.linkedin.com/in/khaled-yasser/>

The NuNet team is also supported by SingularityNET human resources on-need basis while rapidly expanding organically after successful token launch on 17.11.

<https://medium.com/nunet/nunet-community-contribution-round-completed-5543ce39915f>

Pilot and implementation partner:

Nunet will partner with PGWAD for defining the structure and needs in order to enable access to decentralized GPU on Cardano for ML. The pilot will be deployed and run for Fund7 funded project DeepchainAda: Trustless AI training as proof of concept.

PGWAD is a cardano stakepool running on Raspberry Pi. PGWAD is part of the armada-alliance. This is an alliance of independent stake pool operators using low powered ARM cores to help decentralize Cardano. PGWAD is also part of xSPO alliance.

PGWAD means Packet GateWay for AI and Decentralization. PGWAD has been focusing on the DeepchainAda project.

Risk mitigation

Addressed under IMPACT section: What main challenges or risks do you foresee to deliver this project successfully.

Auditability

Roadmap with milestones

Addressed under FEASIBILITY section: Please provide a detailed plan and timeline for delivering the solution.

Metrics/KPISs

NuNet onboarding GPU devices and allowing users to set amount of resources to be used
NuNet adapter equipped with ability to access multi-vendor GPUs
NuNet containers supporting GPU access;
Framework to monitor resource usage of GPUs and make them directly available to the processes running inside the containers
Onboarding ML workloads on NuNet for alpha testing GPUs available on that specific provider device.
Framework and APIs adaptation for compensating GPU and CPU resources via Cardano blockchain transactions;
One of the proposed key metrics for this Challenge is that the proposal addresses the number of developers building on top of Cardano. For the Phase 1, at least one developer creating a complex ML model will be onboarded.

Training a Machine learning (ML) model requires a lot of processing power which can be costly or difficult to obtain. In addition, there are also other use cases on Cardano (AI, dapps, Metaverse etc.) where GPU computing power might be needed.

Solution:

Ability for community providers to onboard their GPU enabled computers via NuNet framework;
Ability to onboard ML workloads on NuNet for alpha testing GPUs available on that specific provider device
Ability for users of ML workloads on NuNet to compensate for GPU resources used using Cardano blockchain transactions;

To summarize, this proposal brings value to Cardano by enabling flexible, decentralized, robust, faster and cheaper GPU resources as a computing framework to support the Cardano ecosystem.

Entirely new project

bookmarked!

bookmarked!