DeepchainAda: Trustless AI training

Detailed Plan

TLDR;

DeepchainAda project focuses on Plutus Smart Contracts and advanced cryptography techniques to help Distributed Machine Learning among parties in a trustless environment. It interfaces the model with SingularityNet Framework using the AI DSL (Domain specific language). The aim is to also enable NuNET (distributed computing) incase party does not have compute. All of this should be achieved in such a way that Privacy and Auditability is preserved. Such a system will open up collaboration in fields like AI drug discovery and decentralizes the Federated Learning.

Proposal Details

Recently there was a question that was investigated by Nature[1]. Can COVID-19 be detected using Machine Learning? If there are X-ray or CT scans then can it be used to detect if the patient has COVID-19 or not. Nature reviwed many papers that were investigating the use of Machinel Learning in Covid-19 study. Out of 415 papers published none could concretely detect COVID-19. The primary reason being inadequate data. The main cause for inadequate data is because of patient privacy policies. Big pharma companies collaborated and worked towards using Machine learning while maintaining privacy. This was done by a project named MELLODDY. But this was a private blockchain.

For many machine learning (ML) application there are two approaches to maintain data privacy. These are

Private Blockchain just like in MELLODY project
Federated learning

We will explore the 3rd option which is use public blockchain and ensure data privacy.

Blockchains are the modern day answer to a trustless distributed systems. In this case the data providing entities are the parties and trustless collaboration can be ensured by blockchains. Blockchain like Cardano can help decentralize federated learning. DeepchainAda focuses on building this distributed learning framework on Cardano. DeepchainAda decentralizes federated learning.

Lets look at the federated learning first. In federated learning there are multiple parties that have the data. The model owner (could be a research institute) develops the models. This model is then distributed to the parties in encrypted form so that the parties can train the model locally. This ensures that the data never leaves the party’s environment. After one round of training, the model parameters are shared to a central server called parameter server. This aggregates the parameters and sends the updates to the parties. The training starts again. This is repeated until the model converges. Since there is diverse data, the belief is that the model gets trained better compared to any single party. This collective learning helps in gaining accuracy towards model predictions. The challenge in this setting is the trustability of the parameter server. Such a requirement prohibits the open collaboration that maybe possible. The need is therefore to have trustless environment where the parties can still collaborate without the need for a parameter server. DeepchainAda needs to work in trustless environment with privacy and audibility guarantees. Lets look at how this is achieved below.

Trustless

Blockchains have shown that a trustless environment is possible and this was first illustrated by Bitcoin. However for a complex task like collaborated deeplearning it became necessary that the blockchain has the capability of programmable smart contracts. In such a trustless environment each participating party can collaborate with other parties. Using these smart contracts the parties can exchange information without a need for trust parameter server. Cardano is the 3rd generation blockchain that enables Proof of Stake consensus along with smart contracts.

Privacy

For privacy, the techniques used in federated learning like homomorphic encryption can be reused in this. DeepchainAda uses threshold cryptography specifically paillier system because it supports homomorphic properties that i need during distributed training. I have talked about this in my deepchainada design and video. However paillier system needs certain initial parameters and keys and usually a trusted dealer is used for distributing keys. But we cannot assume trust. We can assume majority players being honest (51%) but we cannot trust any one of them also

So the design was adapted to use Threshold Paillier without trusted dealer. This way we could distribute the key generation. With this design we get a distributed system where you dont trust anyone but assume 51% honest. So threshold based pallier without trusted dealer was a good design… Until I read Stake based Multisignature (Mithril) from IOHK research[2]. If I can add stake into the threshold based cryptosystem then I can, not only decide on threshold number of parties but also put in stake. In DeepchainAda stake is not Ada, its the stake of the parties in ML. This can be something like the amount of data/any other contribution which boosts the party’s stake. This has to be carefully designed so that one party does not have full control over training. We need this kind of mechanism mainly because we don’t want parties whose stake is less take control over training. For example if we have 10 parties collaborating and in that 6 parties have no stake, then these parties can sabotage the collaborated learning. With stake added to threshold cryptography these 6 parties together will not have enough stake to influence the collaboration. Details about this will be in my paper.

Auditability

Auditability can be ensured because of the open collaboration. So a third party agent can keep the training under check and notify any malicious updates. This is where AI DSL can come handy. Lets take an example where 10 parties are collaborating and a 3rd party whose main job is just to monitor this can notify parties of a malicious actor. This has to be communicated to the parties involved in training and the parties can take action like ignore the updates from malicious actor or remove them from collaboration along with penalty. AI DSL will simplify this communication between parties without the need for human interference or bias. Again the details about this will be in my paper.

Video https://www.youtube.com/watch?v=4u9h7LRv1Qw

Summary

So in short the DeepchainAda project focuses on getting the Plutus Smart Contracts and advanced cryptography techniques to help Distributed Machine Learning and interface the model with SingularityNet Framework using the AI DSL. The aim is to also enable NuNET (distributed computing) incase data provider does not have compute. All of this should be achieved in such a way that Privacy and Auditability is preserved

Please watch my youtube channel to understand some of these things. I have a dedicated playlist for this. I will also be publishing paper for peer review.

I am seeking Catalyst fund as grant for my research in this deepchainAda project. The distributed system needs hardware test setup. I am seeking funds for building the test setup.

Fund Requirements

For DeepchainAda testing Hardware

Desktop : x86 AMD CPU, 2 Nvidia GPU - Total 3 desktops.

For 3 Testnet Cardano Nodes : 6 Raspi 8Gb 128/256GB SSD

Estimated Cost: 25,500USD (8000USD per desktop, 250USD per Raspi)

Funding Needed: 20,000USD

Funding from PGWAD pool: 5500USD

My pool PGWAD has IOG delegation so rewards will be invested in pool pledge and also on DeepchainAda. I plan to invest 5500USD from pool rewards and seek grant for rest

Please note that this HW will be shared on NuNET as Volunteer computing when DeepchainAda is not using it.

Plan

So the plan is

First 3 months : Focus on mainly the smart contract design implementation and setup the testnet network
After 6 months: Get DeepLearning models trained on supervised algorithm. Compare that the framework can converge.
After 10months (Beta Release to initial parties): Analyze security impacts
After 1 year (v0.1 pre-release): Study the impact of gradients, number of parties and training convergence when parties increase.
First public release (1 year) v0.1: Release the framework in open source. The release should enable parties to use the framework when they wish to collaborate towards training a model

Once the framework is ready the AI researchers in field like Drug Discovery can use it to ensure that the model is trained with diverse private data.

Deliverables

The deliverables of the framework are

API and library to interact with the deepchainada smart contracts
Documentation on how to use the framework
Sample examples on how to use the system

KPI

- Collaborated model should have higher training accuracy than reference single party

- Training should not overfit the model as parties increase

- Encryption and decryption times should be same as parties increase

- Model trained should be easily integratabtle with SingularityNet

- One of the parties in the training should use NUNET once its available

References

[1] Roberts, M., Driggs, D., Thorpe, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat Mach Intell 3, 199–217 (2021). https://doi.org/10.1038/s42256-021-00307-0

[2] https://iohk.io/en/research/library/papers/mithrilstake-based-threshold-multisignatures/

bookmarked!

This proposal was approved and funded by the Cardano Community via Project F7: A.I. & SingularityNet a $5T market Catalyst funding round.

Detailed Plan

社区顾问评论 (1)

Comments

Monthly Reports