not approved
RAGDoC: Open Source and Decentralized AI Analysis of Catalyst Proposals
Current Project Status
Unfunded
Amount
Received
₳0
Amount
Requested
₳60,000
Percentage
Received
0.00%
Solution

I will recreate and open source the AI analysis pipeline I made to filter Catalyst proposals for Minswap. However, it will run completely on Cardano infrastructure with fully open source tools.

Problem

Fund 10 had nearly 1600 proposals, which is overwhelming for the standard Catalyst voter. Cardano has many decentralized infrastructure projects, and limited tools for developers to utilize them.

Impact Alignment
Feasibility
Value for money

Team

1 member

RAGDoC: Open Source and Decentralized AI Analysis of Catalyst Proposals

Please describe your proposed solution.

Outline

  1. Abstract
  2. Background
  3. General Approach
  4. Open Source Tooling
  5. Audience

Abstract

I want to build an AI analysis pipeline called RAGDoC (Retrieval Augmented Generation for Documents on Cardano) that runs completely on Cardano (e.g. NuNet and Iagon). This pipeline can cluster and summarize Catalyst proposals (or any set of documents) to make finding proposals that align with your interests easier. In the process of developing this tool, I will create or further develop the open source tools needed to run this and other pipelines on Cardano infrastructure.

Background

I was a member of a group of Minswap volunteers that provided input to Minswap on the 50 proposals that were voted for in Fund10. 1600 proposals was way too many documents to look through, so I decided to use a combination of RAG models, dimension reduction, and clustering to group proposals together and then have AI models summarize each group. This help us to more easily browse through the proposals and find the ones relevant to our community. However, I used OpenAI to accomplish this and never released any of the source code.

Since Fund10, NuNet and Iagon have become much more mature, both having functional alphas for compute and storage respectively. Further, Iagon plans to have an alpha version of compute in early 2024. With these tools, it is possible to completely recreate this the workflow I developed for Minswap using completely open source tools and completely decentralized infrastructure! However, tooling is needed to make it easier for developers to utilize them.

Approach

A broad overview of this approach is retrieval augmented generation (RAG) with a dimension reduction and clustering intermediate step. The general steps are Catalyst proposal aggregation, text embedding with a large language model to obtain vector embeddings, dimension reduction of the vector embeddings, clustering, and finally summarization of the contents of the clusters using a large language model. Below is an example of the result of this workflow from Fund10, showing that the model clustered similar proposals together and appropriately summarized them.

> Group 26 (relevance: 100.00%):

> The common themes across the proposals include the use of the Aiken programming

> language, the need for audits and bug bounties, the goal of increasing DeFi usage on

> Cardano, and the desire to strengthen liquidity in the ecosystem. Other common goals

> include showcasing the efficiency and interoperability of Aiken, empowering Cardano

> developers with open-source tools, upgrading contracts for efficiency and functionality,

> and enabling decentralized renting. Feasibility is a key consideration, with proposals

> emphasizing technical assessments, prototype development and testing, security audits,

> user feedback and validation, and community engagement and adoption. The proposals also

> highlight specific challenges such as the lack of open-source Stableswap and options for

> launching tokens on Cardano, as well as the need for better user experiences during high

> chain load. Customizability and adaptability are important factors in addressing these

> challenges.

>

> Proposals:

> Title: Minswap Aiken Stableswap Audit + Bug Bounty

> https://cardano.ideascale.com/a/dtd/101498-163 (332000 ada requested of 9,080,400 ada available)

> Title: SundaeSwap Aiken Smart Contracts

> https://cardano.ideascale.com/a/dtd/102976-163 (276000 ada requested of 9,080,400 ada available)

> Title: Lenfi V2 Aiken Audit + Bug Bounty

> https://cardano.ideascale.com/a/dtd/103087-163 (265000 ada requested of 9,080,400 ada available)

> Title: Revolutionizing Cardano Rewards Contracts: Aiken Language Upgrade for Efficiency

> and Functionality

> https://cardano.ideascale.com/a/dtd/103870-163 (85000 ada requested of 9,080,400 ada available)

> Title: FluidShare: Decentralized Uncollateralized Renting [Release + Audit + Open

> Source]

> https://cardano.ideascale.com/a/dtd/104787-163 (200000 ada requested of 9,080,400 ada available)

> Title: Minswap Aiken V2 Audit

> https://cardano.ideascale.com/a/dtd/105516-163 (467000 ada requested of 9,080,400 ada available)

> Title: Minswap Liquidity Bootstrapping for DAOs

> https://cardano.ideascale.com/a/dtd/103138-163 (206000 ada requested of 3,158,400 ada available)

The original version of this workflow used OpenAI for the embedding and summarization steps, but these can be replaced by open source models that also perform better than the OpenAI models. For text embedding, I will use Instructor-XL from Meta and the Allen Institute for AI. For summarization I will use Llama2 from Meta's Facebook Research group. A stretch goal for this project will be to generalize the code to use any model for embedding or summarization.

Vector storage will use FAISS (an MIT licensed project from Facebook). Dimension reduction will allow a variety of different reduction types including UMAP and PaCMAP. Clustering will come with the ability to use a variety of clustering algorithms including HDBscan and the standard k-means.

Tooling

All tools will be developed in Python, the primary language used for AI development. The tooling component to this proposal is as valuable as the end product itself. It will create the open source tools, or build upon the existing ones I have released, to enable AI developers to make use of decentralized infrastructure on Cardano.

nunet-py

NuNet is a decentralized computing project on Cardano that allows individuals to rent the processing power of their computer. nunet-py is a project I have developed while actively testing NuNet during it's alpha testing phase, and it allows programmatic execution of jobs on NuNet. It is capable of fully configuring and executing a job on NuNet, but it suffers from some basic usability issues and no documentation. This tool will be further developed and be the job submission tool for running the data aggregation, text embeddings, clustering, etc for RAGDoC.

iagon-py

Iagon is a decentralized, privacy focused storage solution that runs on Cardano. It allows individuals to rent out disk space on their computer. iagon-py is a project I developed during Iagons alpha test phase, but it has very rudimentary functionality and no documentation. This tool will be used for storing intermediate data, such as text embeddings, clusters, and summarization information.

cardano-flows

To provide additional utility to developers, it would be helpful to make the workflow of RAGDoC modular so that data aggregation, embeddings, dimension reduction, clustering, and summarization are all separate steps in the process. The reason is that if each task is made into a separate step, the tools can be re-used for other applications. While there are tools for creating workflows in Python, most are tied to a workflow manager directly. cardano-flows will be a new tool used to create and run workflows on Cardano infrastructure. For this proposal, it will use NuNet for compute and Iagon for storage, but it will make the individual components abstractable so that as new projects come online they can be easily added. For example, when Iagon's compute infrastructure comes online, cardano-flows should be built in a way to easily incorporate it as a compute backend.

RAGDoC Dashboard

The final piece of RAGDoC is a Dashboard for browsing Catalyst data, tuning parameters, and submitting workflows. The Dashboard will be created with Solara, a Python wrapper around React. This dashboard will allow users to submit the pipeline to NuNet and access results from Iagon to be displayed in an interface that will allows users to browse results and link back to the original documents in IdeaScale. Part of this dashboarding will include open sourcing some custom components for Solara, such as the wallet connector that allows people to sign transactions and CIP-8 messages (already live and in use on the SteelSwap dex aggregator).

Audience

I see two general categories of audience for this project:

  1. Individuals and communities voting on Catalyst. This tool can improve the speed of finding relevant proposals for a community, as well as helping to ensure important proposals do not fall through the cracks. It can potentially help to weed out low quality and bad proposals.
  2. Developers interested in deploying on Cardano infrastructure. The road to RAGDoC comes with knock on benefits of more documentation and usability of the underlying tools, that are general utilities not specifically tailored to RAGDoC.

Please define the positive impact your project will have on the wider Cardano community.

The success of this project will give the Cardano community improved mechanisms for evaluating Catalyst proposals, which have become increasingly burdensome with the number of proposals that have been submitted.

Further, the success of this project will enable developers to more easily adopt the decentralized computing tools on Cardano.

The success of this project will be evaluated a few different ways.

  1. I will deploy an instance of the RAGDoC dashboard for people to use in the next funding round. One measure of success will be amount of traffic to the site.
  2. I will track git stars and clones of the RAGDoC repo on Github.
  3. I will track git stars, clones, and projects that use nunet-py, iagon-py, cardano-flows.

What is your capability to deliver your project with high levels of trust and accountability? How do you intend to validate if your approach is feasible?

I am highly capable of delivering this project with high levels of trust and accountability. Since I have already developed a prototype of this tool for the Minswap community for Fund10, and I have prototype versions of most of the tools needed to make this work. I am in active communication with the teams from NuNet and Iagon as I have developed these tools, and I have commitments from the NuNet team for compute resources as I develop this project.

What are the key milestones you need to achieve in order to complete your project successfully?

Outputs

Catalyst proposal aggregation toolbox.

Completion and documentation for nunet-py and iagon-py.

Acceptance Criteria

A github repo with code needed for aggregating Catalyst proposals.

An mkdocs documentation sites for nunet-py and iagon-py, describing all functionality and providing example use cases.

>Outputs

A job specification for configuring jobs in a workflow.

Creation of cardano-flows that permits configuring of jobs in Python, with execution and storage on NuNet and Iagon respectively.

Acceptance Critera

A github repo.

A PyPI package for cardano-flows.

An mkdocs site describing all functionality and providing some simple test cases. One test case will be pulling in Fund11 data, embedding with an open source model, and storing on Iagon.

>Output

The RAGDoC dashboard.

Acceptance Criteria

A Github repo with a Readme on how to set up the dashboard.

A dashboard that will execute the analysis workflow and visualize the outputs.

A deployment that serves the dashboard, with CIP-8 login for credentials.

Stretch Goal

Provide configuration for the workflow to permit different AI models, dimension reduction algorithms and parameter tuning, and clustering algorithms.

Who is in the project team and what are their roles?

I, Elder Millenial, am the sole developer on this project. I possess the AI, compute, and tooling skills needed to perform this work. Although I operate under a pseudonymous name, I will provide any and all verification required if my proposal is selected.

I have already engaged with the NuNet team. I have a direct line of communication to them, and they have committed to computing resources for testing.

I have already engaged with the Iagon team, and I have a direct line of communication with them.

Please provide a cost breakdown of the proposed work and resources.

The predominant cost for this project is my time developing it. I estimate this will take 10-20 hours of work per week over the next 6 months. I am asking for ~10,000 ADA/month.

This doesn't account for any other development costs, such as domain names and server costs for the final deliverable, but any additional costs will come the final two months budget.

How does the cost of the project represent value for money for the Cardano ecosystem?

At approximately 15 hours a week and 10k ADA per month, my hourly cost comes out to about $60/hour. This is entirely reasonable for a mid to senior level dev, and is below what my standard pay is.

close

Playlist

  • EP2: epoch_length

    Authored by: Darlington Kofa

    3m 24s
    Darlington Kofa
  • EP1: 'd' parameter

    Authored by: Darlington Kofa

    4m 3s
    Darlington Kofa
  • EP3: key_deposit

    Authored by: Darlington Kofa

    3m 48s
    Darlington Kofa
  • EP4: epoch_no

    Authored by: Darlington Kofa

    2m 16s
    Darlington Kofa
  • EP5: max_block_size

    Authored by: Darlington Kofa

    3m 14s
    Darlington Kofa
  • EP6: pool_deposit

    Authored by: Darlington Kofa

    3m 19s
    Darlington Kofa
  • EP7: max_tx_size

    Authored by: Darlington Kofa

    4m 59s
    Darlington Kofa
0:00
/
~0:00