• 16 Posts
  • 4 Comments
Joined 1Y ago
cake
Cake day: Jun 10, 2023

help-circle
rss
HyperTech News Report #0003 - Expanding Horizons
cross-posted from: https://lemmy.world/post/6399678 > # 🤖 Happy FOSAI Friday! 🚀 > > Friday, October 6, 2023 > > ## HyperTech News Report #0003 > > Hello Everyone! > > This week highlights a wave of new papers and frameworks that expand upon LLM functionalities. With a tsunami of applications on the horizon I foresee a bedrock of tools to preceed. I'm not sure what kits and processes will end up part of this bedrock, but I hope some of these methods end up interesting or helpful to your workflow! > > ### Table of Contents > - [Community Changelog](#Community-Changelog) > - [Image of the Week](#Image-of-the-Week) > - [News](#News) > - [Tools & Frameworks](#Tools-&-Frameworks) > - [Papers](#Papers) > > ### Community Changelog > > - [Pinned Mistral Megathread](https://lemmy.world/post/6282482) > - [We're R&D'ing FOSAI Models!](https://lemmy.world/post/6067203) > > ## Image of the Week > > ![](https://lemmy.world/pictrs/image/f3fda57b-8d21-4bb5-951e-f6d00510addd.png) > > This image of the week comes from one of my own projects! I hope you don't mind me sharing.. I was really happy with this result. This was generated from an SDXL model I trained and [host on Replicate](https://replicate.com/adynblaed/embersteel-sdxl-andraxus). I use an mock ensemble approach to generate various game assets for an experimental roguelike I'm making with a colleague. > > My current method is not at all efficient, but I have fun. Right now, I have three SDXL models I interact with, each generating art I can use for my project. Andraxus takes care of wallpapers and in-game levels (this image you're seeing here), his > in-game companion Biazera imagines characters and entities of this world, while Cerephelo tinkers and toils over the machinations within - crafting items, loot, powerups, etc. > > I've been hesitant self-promoting here. But if there's genuine interest in this project I would be more than happy sharing more details. It's still in pre-alpha development, but there were plans releasing all of the models we use as open-source (obviously). We're still working on the engine though. Let me know if you want to see more on this project. > > --- > > ## News > > --- > > 1. **Arxiv Publications Workflow**: A new workflow has been introduced that allows users to scrape search topics from Arxiv, converting the results into markdown (MD) format. This makes it easier to digest and understand topics from Arxiv published content. The tool, available on [GitHub](https://github.com/daveshap/weekly_arxiv), is particularly useful for those who wish to delve deeper into research papers and run their own research processes. > > 2. **Texting LLMs from Your Phone**: [A guide has been shared](https://www.twilio.com/blog/ai-sms-chatbot-replicate-llama-2-langchain) that enables users to communicate with their personal assistants via simple text messages. The process involves setting up a Twilio account, purchasing and registering a phone number, and then integrating it with the Replicate platform. The code, available on [GitHub](https://github.com/elizabethsiegle/replicate-llama2-sms-chatbot), makes it possible to send and receive messages from LLMs directly on one's phone. > > 3. **Microsoft's AutoGen**: Microsoft has released AutoGen, a tool designed to aid in the creation of autonomous LLM agents. Compatible with ChatGPT models, AutoGen facilitates the development of LLM applications using multiple agents that can converse with each other to solve tasks. The framework is customizable and allows for seamless human participation. More details can be found on [GitHub](https://github.com/microsoft/autogen). > > 4. **Promptbench and ACE Framework**: `Promptbench` is a new project focused on the evaluation and benchmarking of models. Stemming from the DyVal paper, it aims to provide reliable insights into model performance. On the other hand, the ACE Framework, designed for `autonomous cognitive entities`, offers a unique approach to agent tooling. While still in its early stages, it promises to bring about innovative implementations in the realms of personal assistants, game world NPCs, autonomous employees, and embodied robots. > > 5. **Research Highlights**: Several papers have been published that delve into the intricacies of LLMs. One paper introduces a method to enhance the zero-shot reasoning abilities of LLMs, while another, titled DyVal, proposes a dynamic evaluation protocol for LLMs. Additionally, the concept of Low-Rank Adapters (LoRA) ensembles for LLM fine-tuning has been explored, emphasizing the potential of using one model and dynamically swapping the fine-tuned QLoRA adapters. > > --- > > ## Tools & Frameworks > > --- > > #### Keep Up w/ Arxiv Publications > > - [GitHub](https://github.com/daveshap/weekly_arxiv) > - [Learn More](https://www.youtube.com/watch?v=YwUy5meLnNY&) > > Due to a drastic change in personal and work schedules, I've had to shift how I research and develop posts and projects for you guys. That being said, I found this workflow from the same author of the ACE Framework particularly helpful. It scrapes a search topic from Arxiv and returns a massive XML that is converted to markdown (MD) to then be used as an injectable context report for a LLM of your choosing (to further break down and understand topics) or as a well of information for the classic CTRL + F search. But at this point, info is aggregated (and human readable) from Arxiv published content. > > After reading abstractions you can further drill into each paper and dissect / run your own research processes as you see fit. There is definitely more room for automation and organization here I'm sure, but this has been a big resource for me lately so I wanted to proliferate it for others who might find it helpful too. > > #### Text LLMs from Your Phone > > - [GitHub](https://github.com/elizabethsiegle/replicate-llama2-sms-chatbot) > - [Learn More](https://www.twilio.com/blog/ai-sms-chatbot-replicate-llama-2-langchain) > > I had an itch to make my personal assistants more accessible - so I started investigating ways I could simply text them from my iPhone (via simple sms). There are many other ways I could've done this, but texting has been something I always like to default to in communications. So, I found this cool guide that uses infra I already prefer (Replicate) and has a bonus LangChain integration - which opens up the door to a ton of other opportunities down the line. > > This tutorial was pretty straightforward - but to be honest, making the Twilio account, buying a phone number (then registering it) took the longest. The code itself takes less than 10 minutes to get up and running with ngrok. Super simple and straightforward there. The Twilio process? Not so much.. but it was worth the pain! > > I am still waiting on my phone number to be verified (so that the Replicate inference endpoint can actually send SMS back to me) but I ended the night successfully texting the server on my local PC. It was wild texting the Ahsoka example from my phone and seeing the POST response return (even though it didn't go through SMS I could still see the server successfully receive my incoming message/prompt). I think there's a lot of fun to be had giving casual phone numbers and personalities to assistants like this. Especially if you want to LangChain some functions beyond just the conversation. If there's more interest on this topic, I can share how my assistant evolves once it gets full access to return SMS. I am designing this to streamline my personal life, and if it proves to be useful I will absolutely release the project as open-source. > > #### AutoGen > > - [GitHub](https://github.com/microsoft/autogen) > - [Learn More](https://microsoft.github.io/autogen/docs/Examples/AutoGen-AgentChat/) > - [Tutorial](https://www.youtube.com/watch?v=vU2S6dVf79M&ab_channel=MatthewBerman) > > With Agents on the rise, tools and automation pipelines to build them have become increasingly more important to consider. It seems like Microsoft is well aware of this, and thus released AutoGen, a tool to help enable this automation tooling and creation of autonomous LLM agents. AutoGen is compatible with ChatGPT models and is being kitted for local LLMs as we speak. > > > AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. > > #### Promptbench > > - [GitHub](https://github.com/microsoft/promptbench) > - [Learn More](https://llm-eval.github.io/) > > I recently found `promptbench` - a project that seems to have stemmed from the DyVal paper (shared below). I for one appreciate some of the new tools that are releasing focused around the evaluation and benchmarking of models. I hope we continue to see more evals, benchmarks, and projects that return us insights we can rely upon. > > #### ACE Framework > > ![](https://lemmy.world/pictrs/image/95712590-470e-4a5a-8a79-30c0b1bdb1e5.png) > > - [GitHub](https://github.com/daveshap/ACE_Framework) > - [Learn More](https://www.youtube.com/watch?v=A_BL_pu4Gtk&ab_channel=4IRwithDavidShapiro) > > A new framework has been proposed and designed for `autonomous cognitive entities`. This appears similar to agents and their style of tooling, but with a different architecture approach? I don't believe implementation of this is ready, but it may be soon and something to keep an eye on. > > > There are many possible implementations of the ACE Framework. Rather than detail every possible permutation, here is a list of categories that we perceive as likely and viable. > > > **Personal Assistant and/or Companion** > > > - This is a self-contained version of ACE that is intended to interact with one user. > > - Think of Cortana from HALO, Samantha from HER, or Joi from Blade Runner 2049. (yes, we recognize these are all sexualized female avatars) > > - The idea would be to create something that is effectively a personal Executive Assistant that is able to coordinate, plan, research, and solve problems for you. > This could be deployed on mobile, smart home devices, laptops, or web sites. > > > **Game World NPC's** > > > - This is a kind of game character that has their own personality, motivations, agenda, and objectives. Furthermore, they would have their own unique memories. > > - This can give NPCs a much more realistic ability to pursue their own objectives, which should make game experiences much more dynamic and unpredictable, thus raising novelty. > These can be adapted to 2D or 3D game engines such as PyGame, Unity, or Unreal. > > > **Autonomous Employee** > > > - This is a version of the ACE that is meant to carry out meaningful and productive work inside a corporation. > > - Whether this is a digital CSR or backoffice worker depends on the deployment. > > - It could also be a "digital team member" that primarily interacts via Discord, Slack, or Microsoft Teams. > > > **Embodied Robot** > > > The ACE Framework is ideal to create self-contained, autonomous machines. Whether they are domestic aid robots or something like WALL-E > > --- > > ## Papers > > --- > > [Agent Instructs Large Language Models to be General Zero-Shot Reasoners > ](https://arxiv.org/abs/2310.03710) > > > We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%. > > [DyVal: Graph-informed Dynamic Evaluation of Large Language Models > ](https://arxiv.org/abs/2309.17167v2) > > - https://llm-eval.github.io/ > - https://github.com/microsoft/promptbench > > > Large language models (LLMs) have achieved remarkable performance in various evaluation benchmarks. However, concerns about their performance are raised on potential data contamination in their considerable volume of training corpus. Moreover, the static nature and fixed complexity of current benchmarks may inadequately gauge the advancing capabilities of LLMs. In this paper, we introduce DyVal, a novel, general, and flexible evaluation protocol for dynamic evaluation of LLMs. Based on our proposed dynamic evaluation framework, we build graph-informed DyVal by leveraging the structural advantage of directed acyclic graphs to dynamically generate evaluation samples with controllable complexities. DyVal generates challenging evaluation sets on reasoning tasks including mathematics, logical reasoning, and algorithm problems. We evaluate various LLMs ranging from Flan-T5-large to ChatGPT and GPT4. Experiments demonstrate that LLMs perform worse in DyVal-generated evaluation samples with different complexities, emphasizing the significance of dynamic evaluation. We also analyze the failure cases and results of different prompting methods. Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks. We hope that DyVal can shed light on the future evaluation research of LLMs. > > [LoRA ensembles for large language model fine-tuning](https://arxiv.org/abs/2310.00035v2) > > > Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as overconfidence, poor calibration, and unreliable prediction results on test data or out-of-distribution samples. One approach commonly used in vision for alleviating this issue is a deep ensemble, which constructs an ensemble by training the same model multiple times using different random initializations. However, there is a huge challenge to ensembling LLMs: the most effective LLMs are very, very large. Keeping a single LLM in memory is already challenging enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many settings. To address these issues, we propose an ensemble approach using Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique. Critically, these low-rank adapters represent a very small number of parameters, orders of magnitude less than the underlying pre-trained model. Thus, it is possible to construct large ensembles of LoRA adapters with almost the same computational overhead as using the original model. We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification. > > There is something to be discovered between LoRA, QLoRA, and ensemble/MoE designs. I am digging into this niche because of an [interesting bit I heard from sentdex](https://www.youtube.com/watch?v=J_3hDqSvpmg&ab_channel=sentdex) (if you want to skip to the part I'm talking about, go to 13:58). Around 15:00 minute mark he brings up QLoRA adapters (nothing new) but his approach was interesting. > > He eventually shares he is working on a QLoRA ensemble approach with skunkworks (presumably Boeing skunkworks). This confirmed my suspicion. Better yet - he shared his thoughts on how all of this could be done. Watch and support his [video](https://www.youtube.com/watch?v=J_3hDqSvpmg&ab_channel=sentdex) for more insights, but the idea boils down to using one model and dynamically swapping the fine-tuned QLoRA adapters. I think this is a highly efficient and unapplied approach. Especially in that MoE and ensemble realm of design. If you're reading this and understood anything I said - get to building! This is a seriously interesting idea that could yield positive results. I will share my findings when I find the time to dig into this more. > > --- > > ### Author's Note > > This post was authored by the moderator of !fosai@lemmy.world - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called `HyperionTechnologies` a.k.a. `HyperTech` or `HYPERION` - a sci-fi company. > > ### Thanks for Reading! > > This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now... if you found anything about this post interesting, consider subscribing to !fosai@lemmy.world where you can join us on the journey into the great unknown! > > Until next time! > > #### `Blaed`
fedilink

HyperTech News Report #0002 - A New Challenger Approaches!
cross-posted from: https://lemmy.world/post/5965315 > # 🤖 Happy FOSAI Friday! 🚀 > > Friday, September 29, 2023 > > ## HyperTech News Report #0002 > > Hello Everyone! > > Welcome back to the HyperTech News Report! This week we're seeing some really exciting developments in futuristic technologies. With more tools and methods releasing by the day, I feel we're in for a renaissance in software. I hope hardware is soon to follow.. but I am here for it! So are you. Brace yourselves. Change is coming! This next year will be very interesting to watch unfold. > > #### Table of Contents > - [New Foundation Model!](#New-Foundation-Model) > - [Metaverse Developments](#Metaverse-Developments) > - [NVIDIA NeMo Guardrails](#NVIDIA-NeMo-Guardrails) > - [Tutorial Highlights](#Tutorial-Highlights) > > #### Community Changelog > > - Cleaned up some old content (let me know if you notice something that should be archived or updated) > > ### Image of the Week > ![](https://lemmy.world/pictrs/image/d04adbda-7ce5-4edb-a9d2-3aa6920fa163.png) > > This image of the week comes from a DALL-E 3 demonstration by Will Depue. This depicts a popular image for diffusion models benchmarks - the astronaut riding a horse in space. Apparently this was hard to get right, and others have had trouble replicating it - but it seems to have been generated by DALL-E 3 nevertheless. Curious to see how it stacks up against other diffusers when its more widely available. > > ### New Foundation Model! > > There have been many new models hitting HuggingFace on the daily. The recent influx has made it hard to benchmark and keep up with these models - so I will be highlighting a hand select curated week-by-week, exploring these with more focus (a few at a time). > > If you have any model favorites (or showcase suggestions) let me know what they are in the comments below and I'll add them to the growing catalog! > > This week we're taking a look at Mistral - a new *foundation* model with a sliding attention mechanism that gives it advantages over other models. Better yet - the mistral.ai team released this new model under the Apache 2.0 license. Massive shoutout to this team, this is huge for anyone who wants more options (commercially) outside of Llama 2 and Falcon families. > > From Mistralai: > > > The best 7B, Apache 2.0.. Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases. Mistral 7B is better than Llama 2 13B on all benchmarks, has natural coding abilities, and 8k sequence length. It’s released under Apache 2.0 licence, and we made it easy to deploy on any cloud. > > [Learn More](https://mistral.ai/news/announcing-mistral-7b/) > > Mistralai > - https://huggingface.co/mistralai/Mistral-7B-v0.1 > - https://mistral.ai/news/announcing-mistral-7b/ > - https://docs.mistral.ai/quickstart/ > > TheBloke (Quantized) > - https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF > https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPT > > More About GPTQ > - https://github.com/ggerganov/llama.cpp/pull/1827 > > More About GGUF > - https://github.com/ggerganov/llama.cpp/pull/2398#issuecomment-1682837610 > > ### Metaverse Developments > > Mark Zuckerberg had his third round interview on the Lex Fridman podcast - but this time, in the [updated Metaverse](https://www.youtube.com/watch?v=MVYrJJNdrEg&t=1223s&ab_channel=LexFridman). This is pretty wild. We seem to have officially left uncanny valley territory. There are still clearly bugs and improvements to be made - but imagine the possibilities of this mixed reality technology (paired with VR LLM applications). > > The type of experiences we can begin to explore in these digital realms are going to evolve into things of true sci-fi in our near future. This is all very exciting stuff to look forward to as AI proliferates markets and drives innovation. > > What do you think? Zuck looks more human in the metaverse than in real life.. mission.. success? > > [Click here for the podcast episode](https://www.youtube.com/watch?v=MVYrJJNdrEg&t=1223s&ab_channel=LexFridman). > > ### NVIDIA NeMo Guardrails > > If you haven't heard about NeMo Guardrails, you should check it out. It is a new library and approach for aligning models and completing functions for LLMs. It is similar to LangChain and LlamaIndex, but uses an in-house developed language from NVIDIA called 'colang' for configuration, with NeMo Guardrail libraries in python friendly syntax. > > This is still a new and unexplored tool, but could provide some interesting results with some creative applications. It is also particularly powerful if you need to align enterprise LLMs for clients or stakeholders. > > [Learn More](https://github.com/NVIDIA/NeMo-Guardrails) > > ### Tutorial Highlights > > Mistral 7B - Small But Mighty 🚀 🚀 > - https://www.youtube.com/watch?v=z4wPiallZcI&ab_channel=PromptEngineering > > Chatbots with RAG: LangChain Full Walkthrough > - https://www.youtube.com/watch?v=LhnCsygAvzY&ab_channel=JamesBriggs > > NVIDIA NeMo Guardrails: Full Walkthrough for Chatbots / AI > - https://www.youtube.com/watch?v=SwqusllMCnE&t=1s&ab_channel=JamesBriggs > > ### Author's Note > > This post was authored by the moderator of !fosai@lemmy.world - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called `HyperionTechnologies` a.k.a. `HyperTech` or `HYPERION` - a sci-fi company. > > ### Thanks for Reading! > > If you found anything about this post interesting, consider subscribing to !fosai@lemmy.world where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time. > > Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown. > > Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse. > > This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now... > > Until next time! > > #### `Blaed`
fedilink

HyperTech News Report #0001 - Happy FOSAI Friday!
cross-posted from: https://lemmy.world/post/5549499 > # 🤖 Happy FOSAI Friday! 🚀 > > Friday, September 22, 2023 > > ## HyperTech News Report #0001 > > Hello Everyone! > > This series is a new vehicle for !fosai@lemmy.world news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already. > > #### Table of Contents > - [Introducing HyperTech](Introducing-HyperTech) > - [New GGUF Models](#GGUF-Models) > - [Falcon 180B](#Falcon-180B) > - [Llama 3 Rumors](#LLama-3-Rumors) > - [DALM RAG Toolkit](#DALM) > - [DALL-E 3](DALL-E-3) > > #### Community Changelog > > - Updated all resources on [FOSAI ▲ XYZ](https://fosai.xyz). > - Added new content to [FOSAI ▲ XYZ](https://fosai.xyz). > - Added new content and resources to the !fosai@lemmy.world sidebar. > - Added `HyperTech` to !fosai@lemmy.world, reflecting personal workflows and processes. > - All changes should be visible within the next 48 hours. > > ### Image of the Week > > A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models. > > [Read More](https://arstechnica.com/information-technology/2023/09/dreamy-ai-generated-geometric-scenes-mesmerize-social-media-users/) > > ![](https://lemmy.world/pictrs/image/be414224-7603-4a60-8be5-485b122561bd.png) > > ### Introducing HyperTech > > `HyperionTechnologies` a.k.a. `HyperTech` or `HYPERION` - a sci-fi company. > > [`HyperTech Workshop (V0.1.0)`](https://github.com/adynblaed/hypertech-workshop) > > I am excited to announce my technology company: `HyperTech`. The first project of `HyperionTechnologies` is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. `HyperTech` is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of !fosai@lemmy.world with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to `HyperTech` later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is `HYPERION`. (don't take this project too seriously). > > ### New GGUF Models > > Within this last month or so, [llama.cpp](https://github.com/ggerganov/llama.cpp) have begun to standardize a new model format - the `.GGUF` model - which is much more optimized than its now legacy (and deprecated predecessor - `GGML`). This is a big deal for anyone running `GGML` models. `GGUF` is basically superior in all ways. Check out [llama.cpp's notes about this change on their official GitHub](https://github.com/ggerganov/llama.cpp/pull/2398#issuecomment-1682837610). I have used a few `GGUF` models myself and have found them much more performant than any GGML counterpart. [TheBloke](https://huggingface.co/TheBloke) has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp). > > [More About GGUF](https://github.com/ggerganov/llama.cpp/pull/2398#issuecomment-1682837610): > > > It is a successor file format to `GGML`, `GGMF` and `GGJT`, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to `GGML` without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with `rope-freq-base`, `rope-freq-scale`, `gqa`, and `rms-norm-eps`. Prompt formats could also be set automatically. > > ### Falcon 180B > > Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release [here on HuggingFace](https://huggingface.co/blog/falcon-180b). Can't wait to see what comes next! This will open up a lot of doors for us to explore. > > > Today, we're excited to welcome [TII's](https://falconllm.tii.ae/) Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's [RefinedWeb dataset](https://huggingface.co/datasets/tiiuae/falcon-refinedweb). This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch. > > > The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets. > > > ‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes. > > > You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the [Falcon Chat Demo Space](https://huggingface.co/spaces/tiiuae/falcon-180b-demo). > > ### LLama 3 Rumors > > Speaking of big open-source models - [Llama 3 is rumored](https://mspoweruser.com/llama-3/) to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade. > > > Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans. > > ### DALM > > I recently stumbled across [DALM](https://github.com/arcee-ai/DALM) - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a [retrieval augmented generation](https://arxiv.org/abs/2005.11401) ([RAG](https://research.ibm.com/blog/retrieval-augmented-generation-RAG)) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows. > > [DALM](https://github.com/arcee-ai/DALM) Manifesto: > > > A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview. > > > For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient. > > ### DALL-E 3 > > [OpenAI announced DALL-E 3](https://openai.com/dall-e-3) that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows. > > I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates. > > I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up. > > [More About DALL-E 3](https://openai.com/dall-e-3): > > > DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words. > > > DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them. > > ### Author's Note > > This post was authored by the moderator of !fosai@lemmy.world - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called `HyperionTechnologies` a.k.a. `HyperTech` or `HYPERION` - a sci-fi company. > > ### Thanks for Reading! > > If you found anything about this post interesting, consider subscribing to !fosai@lemmy.world where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time. > > Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown. > > Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse. > > This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now... > > Until next time! > > #### `Blaed`
fedilink

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B
cross-posted from: https://lemmy.world/post/3879861 > # Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B > > Hello everyone! This post marks an exciting moment for !fosai@lemmy.world and everyone in the open-source large language model and AI community. > > We appear to have a new contender on the block, a model apparently capable of surpassing OpenAI's state of the art ChatGPT-4 in coding evals (evaluations). > > This is huge. Not too long ago I made an offhand comment on us catching up to GPT-4 within a year. I did not expect that prediction to end up being reality in half the time. Let's hope this isn't a one-off scenario and that we see a new wave of open-source models that begin to challenge OpenAI. > > Buckle up, it's going to get interesting! > > Here's some notes from the blog, which you should visit and read in its entirety: > > - https://www.phind.com/blog/code-llama-beats-gpt4 > > --- > # [Blog Post](https://www.phind.com/blog/code-llama-beats-gpt4) > > We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved **67.6% and 69.5% pass@1 on HumanEval**, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset. > > The CodeLlama models released yesterday demonstrate impressive performance on HumanEval. > > - CodeLlama-34B achieved 48.8% pass@1 on HumanEval > - CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval > > We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens. > > Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and **found no contaminated examples.**  > > The methodology is: > > - For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters. > - A match was identified if any sampled substring was a substring of the processed training example. > > For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models: > > - Phind-CodeLlama-34B-v1 achieved **67.6% pass@1** on HumanEval > - Phind-CodeLlama-34B-Python-v1 achieved **69.5% pass@1** on HumanEval > > ### Download > > We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results. > > - https://huggingface.co/Phind/Phind-CodeLlama-34B-v1 > - https://huggingface.co/Phind/Phind-CodeLlama-34B-Python-v1 > > --- > > If you get a chance to try either of these models out, let us know how it goes in the comments below! > > If you found anything about this post interesting, consider subscribing to !fosai@lemmy.world. > > Cheers to the power of open-source! May we continue the fight for optimization, efficiency, and performance.
fedilink

Introducing Stable-Diffusion.cpp (Inference in Pure C/C++)
cross-posted from: https://lemmy.world/post/3549390 > ## stable-diffusion.cpp > > Introducing `stable-diffusion.cpp`, a pure C/C++ inference engine for Stable Diffusion! This is a really awesome implementation to help speed up home inference of diffusion models. > > Tailored for developers and AI enthusiasts, this repository offers a high-performance solution for creating and manipulating images using various quantization techniques and accelerated inference. > > - https://github.com/leejet/stable-diffusion.cpp > > --- > > ### **Key Features:** > > - **Efficient Implementation:** Utilizing plain C/C++, it operates seamlessly like [llama.cpp](https://github.com/ggerganov/llama.cpp) and is built on the [ggml](https://github.com/ggerganov/ggml) framework. > - **Multiple Precision Support:** Choose between 16-bit, 32-bit float, and 4-bit to 8-bit integer quantization. > - **Optimized Performance:** Experience memory-efficient CPU inference with AVX, AVX2, and AVX512 support for x86 architectures. > - **Versatile Modes:** From original `txt2img` to `img2img` modes and negative prompt handling, customize your processing needs. > - **Cross-Platform Compatibility:** Runs smoothly on Linux, Mac OS, and Windows. > > --- > > ### **Getting Started** > Cloning, building, and running are made simple, and detailed examples are provided for both text-to-image and image-to-image generation. With an array of options for precision and comprehensive usage guidelines, you can easily adapt the code for your specific project requirements. > > ``` > git clone --recursive https://github.com/leejet/stable-diffusion.cpp > cd stable-diffusion.cpp > ``` > > - If you have already cloned the repository, you can use the following command to update the repository to the latest code. > > ``` > cd stable-diffusion.cpp > git pull origin master > git submodule update > ``` > > --- > > ### More Details > > - Plain C/C++ implementation based on [ggml](https://github.com/ggerganov/ggml), working in the same way as [llama.cpp](https://github.com/ggerganov/llama.cpp) > - 16-bit, 32-bit float support > - 4-bit, 5-bit and 8-bit integer quantization support > - Accelerated memory-efficient CPU inference > - Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image > - AVX, AVX2 and AVX512 support for x86 architectures > - Original `txt2img` and `img2img` mode > - Negative prompt > - [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) style tokenizer (not all the features, only token weighting for now) > - Sampling method > - `Euler A` > - Supported platforms > - Linux > - Mac OS > - Windows > > --- > > This is a really exciting repo. I'll be honest, I don't think I am as well versed in what's going on for diffusion inference - but I do know more efficient and effective methods running those models are always welcome by people frequently using diffusers. Especially for those who need to multi-task and maintain performance headroom.
fedilink

Incognito Pilot: The Next-Gen AI Code Interpreter for Sensitive Data
cross-posted from: https://lemmy.world/post/3350022 > **Incognito Pilot: The Next-Gen AI Code Interpreter for Sensitive Data** > > Hello everyone! Today marks the first day of a new series of posts featuring projects in my [**GitHub Stars**](https://github.com/adynblaed?tab=stars). > > Most of these repos are FOSS & FOSAI focused, meaning they should be hackable, free, and (mostly) open-source. > > We're going to kick this series off by sharing [**Incognito Pilot**](https://github.com/silvanmelchior/IncognitoPilot). It’s like the ChatGPT Code Interpreter but for those who prioritize data privacy. > > ![](https://lemmy.world/pictrs/image/514129c2-9a98-4e82-b315-fd9094f1d7c0.png) > > **Project Summary from ChatGPT-4**: > > **Features:** > > - Powered by Large Language Models like GPT-4 and Llama 2. > - Run code and execute tasks with Python interpreter. > - _Privacy_: Interacts with cloud but sensitive data stays local. > - **Local or Remote**: Choose between local LLMs (like Llama 2) or API (like GPT-4) with data approval mechanism. > > You can use Incognito Pilot to: > > - Analyse data, create visualizations. > - Convert files, e.g., video to gif. > - Internet access for tasks like downloading data. > > Incognito Pilot ensures data privacy while leveraging GPT-4's capabilities. > > **Getting Started:** > > 1. **Installation**: > > - Use Docker (For Llama 2, check dedicated installation). > - Create a folder for Incognito Pilot to access. Example: `/home/user/ipilot`. > - Have an OpenAI account & API key. > - Use the provided docker command to run. > - Access via: [http://localhost:3030](http://localhost:3030/) > - _Bonus_: Works with OpenAI's free trial credits (For GPT-3.5). > 2. **First Steps**: > > - Chat with the interface: Start by saying "Hi". > - Get familiar: Command it to print "Hello World". > - Play around: Make it create a text file with numbers. > > **Notes**: > > - Data you enter and approved code results are sent to cloud APIs. > - All data is processed locally. > - Advanced users can customize Python interpreter packages for added functionalities. > > **FAQs**: > > - **Comparison with ChatGPT Code Interpreter**: Incognito Pilot offers a balance between privacy and functionality. It allows internet access, and can be run on powerful machines for larger tasks. > > - **Why use Incognito Pilot over just ChatGPT**: Multi-round code execution, tons of pre-installed dependencies, and a sandboxed environment. > > - **Data Privacy with Cloud APIs**: Your core data remains local. Only meta-data approved by you gets sent to the API, ensuring a controlled and conscious usage. > > --- > > Personally, my only concern using ChatGPT has always been about data privacy. This explores an interesting way to solve that while still getting the state of the art performance that OpenAI has managed to maintain (so far). > > I am all for these pro-privacy projects. I hope to see more emerge! > > If you get a chance to try this, let us know your experience in the comments below! > > --- > > **Links from this Post** > - https://github.com/adynblaed?tab=stars > - https://github.com/silvanmelchior/IncognitoPilot >- Subscribe to !fosai@lemmy.world!
fedilink

Vicuna v1.5 Has Been Released!
[Click Here to be Taken to the Megathread!](https://lemmy.world/post/2580983) from !fosai@lemmy.world > # **Vicuna v1.5 Has Been Released!** > > Shoutout to [GissaMittJobb@lemmy.ml](https://lemmy.ml/GissaMittJobb) for catching this in an [earlier post](https://lemmy.world/post/2575250). > > Given Vicuna was a widely appreciated member of the original Llama series, it'll be exciting to see this model evolve and adapt with fresh datasets and new training and fine-tuning approaches. > > Feel free using this megathread to chat about Vicuna and any of your experiences with Vicuna v1.5! > > # **Starting off with Vicuna v1.5** > > TheBloke is already sharing models! > > ## **Vicuna v1.5 GPTQ** > > ### 7B > - [Vicuna-7B-v1.5-GPTQ](https://huggingface.co/TheBloke/vicuna-7B-v1.5-GPTQ) > - [Vicuna-7B-v1.5-16K-GPTQ](https://huggingface.co/TheBloke/vicuna-7B-v1.5-16K-GPTQ) > > ### 13B > - [Vicuna-13B-v1.5-GPTQ](https://huggingface.co/TheBloke/vicuna-13B-v1.5-GPTQ) > > --- > > ## **Vicuna Model Card** > > ### **Model Details** > > Vicuna is a chat assistant fine-tuned from Llama 2 on user-shared conversations collected from ShareGPT. > > #### Developed by: LMSYS > > - **Model type**: An auto-regressive language model based on the transformer architecture > - **License**: Llama 2 Community License Agreement > - **Finetuned from model**: Llama 2 > > #### Model Sources > > - **Repository**: [https://github.com/lm-sys/FastChat](https://github.com/lm-sys/FastChat) > - **Blog**: [https://lmsys.org/blog/2023-03-30-vicuna/](https://lmsys.org/blog/2023-03-30-vicuna/) > - **Paper**: [https://arxiv.org/abs/2306.05685](https://arxiv.org/abs/2306.05685) > - **Demo**: [https://chat.lmsys.org/](https://chat.lmsys.org/) > > #### Uses > > The primary use of Vicuna is for research on large language models and chatbots. The target userbase includes researchers and hobbyists interested in natural language processing, machine learning, and artificial intelligence. > > #### How to Get Started with the Model > > - Command line interface: https://github.com/lm-sys/FastChat#vicuna-weights > - APIs (OpenAI API, Huggingface API): https://github.com/lm-sys/FastChat/tree/main#api > > #### Training Details > > Vicuna v1.5 is fine-tuned from Llama 2 using supervised instruction. The model was trained on approximately 125K conversations from ShareGPT.com. > > For additional details, please refer to the "Training Details of Vicuna Models" section in the appendix of the linked paper. > > #### Evaluation Results > > ![Vicuna Evaluation Results](https://lemmy.world/pictrs/image/c99acca3-e3d2-4d62-bc68-1d829f40b11a.png) > > Vicuna is evaluated using standard benchmarks, human preferences, and LLM-as-a-judge. For more detailed results, please refer to the paper and leaderboard.
fedilink

I am actively testing this out. It’s hard to say at the moment. There’s a lot to figure out deploying a model into a live environment, but I think there’s real value in using them for technical tasks - especially as models mature and improve over time.

At the moment, though, performance is closer to GPT 3.5 than GPT 4, but I wouldn’t be surprised if this is no longer the case within the next year or so.


Free Open-Source AI LLM Guide
cross-posted from: https://lemmy.world/post/2219010 > Hello everyone! > > We have officially hit 1,000 subscribers! How exciting!! Thank you for being a member of !fosai@lemmy.world. Whether you're a casual passerby, a hobby technologist, or an up-and-coming AI developer - I sincerely appreciate your interest and support in a future that is free and open for all. > > It can be hard to keep up with the rapid developments in AI, so I have decided to pin this at the top of our community to be a frequently updated LLM-specific resource hub and model index for all of your adventures in FOSAI. > > The ultimate goal of this guide is to become a gateway resource for anyone looking to get into free open-source AI (particularly text-based large language models). I will be doing a similar guide for image-based diffusion models soon! > > In the meantime, I hope you find what you're looking for! Let me know in the comments if there is something I missed so that I can add it to the guide for everyone else to see. > > --- > > ## **Getting Started With Free Open-Source AI** > > Have no idea where to begin with AI / LLMs? Try starting with our [Lemmy Crash Course for Free Open-Source AI](https://lemmy.world/post/76020). > > When you're ready to explore more resources see our [FOSAI Nexus](https://lemmy.world/post/814816) - a hub for all of the major FOSS & FOSAI on the cutting/bleeding edges of technology. > > If you're looking to jump right in, I recommend downloading [oobabooga's text-generation-webui](https://github.com/oobabooga/text-generation-webui) and installing one of the LLMs from [TheBloke](https://huggingface.co/TheBloke) below. > > Try both GGML and GPTQ variants to see which model type performs to your preference. See the hardware table to get a better idea on which parameter size you might be able to run (3B, 7B, 13B, 30B, 70B). > > ### **8-bit System Requirements** > > | Model | VRAM Used | Minimum Total VRAM | Card Examples | RAM/Swap to Load* | > |-----------|-----------|--------------------|-------------------|-------------------| > | LLaMA-7B | 9.2GB | 10GB | 3060 12GB, 3080 10GB | 24 GB | > | LLaMA-13B | 16.3GB | 20GB | 3090, 3090 Ti, 4090 | 32 GB | > | LLaMA-30B | 36GB | 40GB | A6000 48GB, A100 40GB | 64 GB | > | LLaMA-65B | 74GB | 80GB | A100 80GB | 128 GB | > > ### **4-bit System Requirements** > > | Model | Minimum Total VRAM | Card Examples | RAM/Swap to Load* | > |-----------|--------------------|--------------------------------|-------------------| > | LLaMA-7B | 6GB | GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 | 6 GB | > | LLaMA-13B | 10GB | AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 | 12 GB | > | LLaMA-30B | 20GB | RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 | 32 GB | > | LLaMA-65B | 40GB | A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 | 64 GB | > > *System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM. > > When in doubt, try starting with 3B or 7B models and work your way up to 13B+. > > ### **FOSAI Resources** > > **Fediverse / FOSAI** > - [The Internet is Healing](https://www.youtube.com/watch?v=TrNE2fSCeFo) > - [FOSAI Welcome Message](https://lemmy.world/post/67758) > - [FOSAI Crash Course](https://lemmy.world/post/76020) > - [FOSAI Nexus Resource Hub](https://lemmy.world/post/814816) > > **LLM Leaderboards** > - [HF Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) > - [LMSYS Chatbot Arena](https://chat.lmsys.org/?leaderboard) > > **LLM Search Tools** > - [LLM Explorer](https://llm.extractum.io/) > - [Open LLMs](https://github.com/eugeneyan/open-llms) > > --- > > ## **Large Language Model Hub** > > [Download Models](https://huggingface.co/TheBloke) > > ### [oobabooga](https://github.com/oobabooga/text-generation-webui) > text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of [HuggingFace](https://huggingface.co/TheBloke) which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) of text generation. It is highly compatible with many formats. > > ### [Exllama](https://github.com/turboderp/exllama) > A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs. > > ### [gpt4all](https://github.com/nomic-ai/gpt4all) > Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors. > > ### [TavernAI](https://github.com/TavernAI/TavernAI) > The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern). > > ### [SillyTavern](https://github.com/SillyTavern/SillyTavern) > Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8 > > ### [Koboldcpp](https://github.com/LostRuins/koboldcpp) > A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights. > > ### [KoboldAI-Client](https://github.com/KoboldAI/KoboldAI-Client) > This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed. > > ### [h2oGPT](https://github.com/h2oai/h2ogpt) > h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer. > > --- > > ## **Models** > > ### The Bloke > The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs). > > These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home. > > Support [TheBloke](https://huggingface.co/TheBloke) here. > > - [https://ko-fi.com/TheBlokeAI](https://ko-fi.com/TheBlokeAI) > > --- > > #### 70B > - [Llama-2-70B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ) > - [Llama-2-70B-Chat-GGML](https://huggingface.co/TheBloke/Llama-2-70B-Chat-GGML) > > - [Llama-2-70B-GPTQ](https://huggingface.co/TheBloke/Llama-2-70B-GPTQ) > - [Llama-2-70B-GGML](https://huggingface.co/TheBloke/Llama-2-70B-GGML) > > - [llama-2-70b-Guanaco-QLoRA-GPTQ](https://huggingface.co/TheBloke/llama-2-70b-Guanaco-QLoRA-GPTQ) > > --- > > #### 30B > - [30B-Epsilon-GPTQ](https://huggingface.co/TheBloke/30B-Epsilon-GPTQ) > > --- > > #### 13B > - [Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ) > - [Llama-2-13B-chat-GGML](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML) > > - [Llama-2-13B-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-GPTQ) > - [Llama-2-13B-GGML](https://huggingface.co/TheBloke/Llama-2-13B-GGML) > > - [llama-2-13B-German-Assistant-v2-GPTQ](https://huggingface.co/TheBloke/llama-2-13B-German-Assistant-v2-GPTQ) > - [llama-2-13B-German-Assistant-v2-GGML](https://huggingface.co/TheBloke/llama-2-13B-German-Assistant-v2-GGML) > > - [13B-Ouroboros-GGML](https://huggingface.co/TheBloke/13B-Ouroboros-GGML) > - [13B-Ouroboros-GPTQ](https://huggingface.co/TheBloke/13B-Ouroboros-GPTQ) > > - [13B-BlueMethod-GGML](https://huggingface.co/TheBloke/13B-BlueMethod-GGML) > - [13B-BlueMethod-GPTQ](https://huggingface.co/TheBloke/13B-BlueMethod-GPTQ) > > - [llama-2-13B-Guanaco-QLoRA-GGML](https://huggingface.co/TheBloke/llama-2-13B-Guanaco-QLoRA-GGML) > - [llama-2-13B-Guanaco-QLoRA-GPTQ](https://huggingface.co/TheBloke/llama-2-13B-Guanaco-QLoRA-GPTQ) > > - [Dolphin-Llama-13B-GGML](https://huggingface.co/TheBloke/Dolphin-Llama-13B-GGML) > - [Dolphin-Llama-13B-GPTQ](https://huggingface.co/TheBloke/Dolphin-Llama-13B-GPTQ) > > - [MythoLogic-13B-GGML](https://huggingface.co/TheBloke/MythoLogic-13B-GGML) > - [MythoBoros-13B-GPTQ](https://huggingface.co/TheBloke/MythoBoros-13B-GPTQ) > > - [WizardLM-13B-V1.2-GPTQ](https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GPTQ) > - [WizardLM-13B-V1.2-GGML](https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GGML) > > - [OpenAssistant-Llama2-13B-Orca-8K-3319-GGML](https://huggingface.co/TheBloke/OpenAssistant-Llama2-13B-Orca-8K-3319-GGML) > > --- > > #### 7B > - [Llama-2-7B-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-GPTQ) > - [Llama-2-7B-GGML](https://huggingface.co/TheBloke/Llama-2-7B-GGML) > > - [Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ) > - [LLongMA-2-7B-GPTQ](https://huggingface.co/TheBloke/LLongMA-2-7B-GPTQ) > > - [llama-2-7B-Guanaco-QLoRA-GPTQ](https://huggingface.co/TheBloke/llama-2-7B-Guanaco-QLoRA-GPTQ) > - [llama-2-7B-Guanaco-QLoRA-GGML](https://huggingface.co/TheBloke/llama-2-7B-Guanaco-QLoRA-GGML) > > - [llama2_7b_chat_uncensored-GPTQ](https://huggingface.co/TheBloke/llama2_7b_chat_uncensored-GPTQ) > - [llama2_7b_chat_uncensored-GGML](https://huggingface.co/TheBloke/llama2_7b_chat_uncensored-GGML) > > --- > > ## **More Models** > - [Any of KoboldAI's Models](https://huggingface.co/KoboldAI) > > - [Luna-AI-Llama2-Uncensored-GPTQ](https://huggingface.co/TheBloke/Luna-AI-Llama2-Uncensored-GPTQ) > > - [Nous-Hermes-Llama2-GGML](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-GGML) > - [Nous-Hermes-Llama2-GPTQ](https://huggingface.co/TheBloke/Nous-Hermes-Llama2-GPTQ) > > - [FreeWilly2-GPTQ](https://huggingface.co/TheBloke/FreeWilly2-GPTQ) > > --- > > ## **GL, HF!** > > Are you an LLM Developer? Looking for a shoutout or project showcase? Send me a message and I'd be more than happy to share your work and support links with the community. > > If you haven't already, consider subscribing to the free open-source AI community at !fosai@lemmy.world where I will do my best to make sure you have access to free open-source artificial intelligence on the bleeding edge. > > Thank you for reading!
fedilink

Llama-2 FOSAI & LLM Roundup Series! (Summer 2023 Edition)
cross-posted from: https://lemmy.world/post/1894070 > ## **Welcome to the Llama-2 FOSAI & LLM Roundup Series!** > > **(Summer 2023 Edition)** > > Hello everyone! > > The wave of innovation I mentioned in our [Llama-2 announcement](https://lemmy.world/post/1750098) is already on its way. The first tsunami of base models and configurations are being released as you read this post. > > That being said, I'd like to take a moment to shoutout [TheBloke](https://huggingface.co/TheBloke), who is rapidly converting many of these models for the greater good of FOSS & FOSAI. > > You can support [TheBloke](https://huggingface.co/TheBloke) here. > - https://ko-fi.com/TheBlokeAI > > Below you will find all of the latest Llama-2 models that are FOSAI friendly. This means they are commercially available, ready to use, and open for development. I will be continuing this series exclusively for Llama models. I have a feeling it will continue being a popular choice for quite some time. I will consider giving other foundational models a similar series if they garner enough support and consideration. For now, enjoy this new herd of Llamas! > > All that you need to get started is capable hardware and a few moments setting up your inference platform (selected from any of your preferred software choices in the [Lemmy Crash Course for Free Open-Source AI](https://lemmy.world/post/76020) > or [FOSAI Nexus](https://lemmy.world/post/814816) resource, which is also shared at the bottom of this post). > > Keep reading to learn more about the exciting new models coming out of Llama-2! > > ### **8-bit System Requirements** > > | Model | VRAM Used | Minimum Total VRAM | Card Examples | RAM/Swap to Load* | > |-----------|-----------|--------------------|-------------------|-------------------| > | LLaMA-7B | 9.2GB | 10GB | 3060 12GB, 3080 10GB | 24 GB | > | LLaMA-13B | 16.3GB | 20GB | 3090, 3090 Ti, 4090 | 32 GB | > | LLaMA-30B | 36GB | 40GB | A6000 48GB, A100 40GB | 64 GB | > | LLaMA-65B | 74GB | 80GB | A100 80GB | 128 GB | > > ### **4-bit System Requirements** > > | Model | Minimum Total VRAM | Card Examples | RAM/Swap to Load* | > |-----------|--------------------|--------------------------------|-------------------| > | LLaMA-7B | 6GB | GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 | 6 GB | > | LLaMA-13B | 10GB | AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 | 12 GB | > | LLaMA-30B | 20GB | RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 | 32 GB | > | LLaMA-65B | 40GB | A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 | 64 GB | > > *System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM. > > --- > > ### **The Bloke** > One of the most popular and consistent developers releasing consumer-friendly versions of LLMs. These active conversions of trending models allow for many of us to run these GPTQ or GGML variants at home on our own PCs and hardware. > > **70B** > > - [TheBloke/Llama-2-70B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ) > > - [TheBloke/Llama-2-70B-Chat-fp16](https://huggingface.co/TheBloke/Llama-2-70B-Chat-fp16) > > - [TheBloke/Llama-2-70B-GPTQ](https://huggingface.co/TheBloke/Llama-2-70B-GPTQ) > > - [TheBloke/Llama-2-70B-fp16](https://huggingface.co/TheBloke/Llama-2-70B-fp16) > > **13B** > > - [TheBloke/Llama-2-13B-chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ) > > - [TheBloke/Llama-2-13B-chat-GGML](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML) > > - [TheBloke/Llama-2-13B-GPTQ](https://huggingface.co/TheBloke/Llama-2-13B-GPTQ) > > - [TheBloke/Llama-2-13B-GGML](https://huggingface.co/TheBloke/Llama-2-13B-GGML) > > - [TheBloke/Llama-2-13B-fp16](https://huggingface.co/TheBloke/Llama-2-13B-fp16) > > **7B** > > - [TheBloke/Llama-2-7B-GPTQ](https://huggingface.co/TheBloke/Llama-2-7B-GPTQ) > > - [TheBloke/Llama-2-7B-GGML)](https://huggingface.co/TheBloke/Llama-2-7B-GGML) > > - [TheBloke/Llama-2-7B-fp16](https://huggingface.co/TheBloke/Llama-2-7B-fp16) > > - [TheBloke/Llama-2-7B-fp16](https://huggingface.co/TheBloke/Llama-2-7B-fp16) > > - [TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ) > > ### **LLongMA** > LLongMA-2, a suite of Llama-2 models, trained at 8k context length using linear positional interpolation scaling. > > **13B** > > - [conceptofmind/LLongMA-2-13b](https://huggingface.co/conceptofmind/LLongMA-2-13b) > > **7B** > > - [conceptofmind/LLongMA-2-7b](https://huggingface.co/conceptofmind/LLongMA-2-7b) > > Also available from The Bloke in GPTQ and GGML formats: > > **7B** > > - [TheBloke/LLongMA-2-7B-GPTQ](https://huggingface.co/TheBloke/LLongMA-2-7B-GPTQ) > > - [TheBloke/LLongMA-2-7B-GGML](https://huggingface.co/TheBloke/LLongMA-2-7B-GGML) > > ### **Puffin** > The first commercially available language model released by Nous Research! Available in 13B parameters. > > **13B** > > - [NousResearch/Redmond-Puffin-13B-GGML](https://huggingface.co/NousResearch/Redmond-Puffin-13B-GGML) > > - [NousResearch/Redmond-Puffin-13B](https://huggingface.co/NousResearch/Redmond-Puffin-13B) > > Also available from The Bloke in GPTQ and GGML formats: > > **13B** > > - [TheBloke/Redmond-Puffin-13B-GPTQ](https://huggingface.co/TheBloke/Redmond-Puffin-13B-GPTQ) > > - [TheBloke/Redmond-Puffin-13B-GGML](https://huggingface.co/TheBloke/Redmond-Puffin-13B-GGML) > > ### **Other Models** > Leaving a section here for 'other' LLMs or fine tunings derivative of Llama-2 models. > > **7B** > > - [georgesung/llama2_7b_chat_uncensored](https://huggingface.co/georgesung/llama2_7b_chat_uncensored) > > --- > > ### **Getting Started w/ FOSAI!** > > Have no idea where to begin with AI/LLMs? [Try starting here with ](https://understandgpt.ai/docs/getting-started/what-is-a-llm) [UnderstandGPT](https://understandgpt.ai/) to learn the basics of LLMs before visiting our [Lemmy Crash Course for Free Open-Source AI](https://lemmy.world/post/76020) > > If you're looking to explore more resources, see our [FOSAI Nexus](https://lemmy.world/post/814816) for a list of all the major FOSS/FOSAI in the space. > > If you're looking to jump right in, visit some of the links below and stick to models that are <13B in parameter (unless you have the power and hardware to spare). > > **FOSAI Resources** > > **Fediverse / FOSAI** > - [The Internet is Healing](https://www.youtube.com/watch?v=TrNE2fSCeFo) > - [FOSAI Welcome Message](https://lemmy.world/post/67758) > - [FOSAI Crash Course](https://lemmy.world/post/76020) > - [FOSAI Nexus Resource Hub](https://lemmy.world/post/814816) > > **LLM Leaderboards** > - [HF Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) > - [LMSYS Chatbot Arena](https://chat.lmsys.org/?leaderboard) > > **LLM Search Tools** > - [LLM Explorer](https://llm.extractum.io/) > - [Open LLMs](https://github.com/eugeneyan/open-llms) > > ### **GL, HF!** > > If you found anything about this post interesting - consider subscribing to !fosai@lemmy.world where I do my best to keep you in the know about the most important updates in free open-source artificial intelligence. > > I will try to continue doing this series season by season, making this a living post for the rest of this summer. If I have missed a noteworthy model, don't hesitate to let me know in the comments so I can keep this resource up-to-date. > > Thank you for reading! I hope you find what you're looking for. Be sure to subscribe and bookmark the [main post](https://lemmy.world/post/1894070) if you want a quick one-stop shop for all of the new Llama-2 models that will be emerging the rest of this summer!
fedilink

Introducing Llama 2 - Meta’s Next-Generation Commercially Viable Open-Source AI & LLM
cross-posted from: https://lemmy.world/post/1750098 ## **[Introducing Llama 2 - Meta's Next Generation Free Open-Source Artificially Intelligent Large Language Model](https://ai.meta.com/llama/)** ![Llama 2](https://lemmy.world/pictrs/image/00096be3-c55f-40e8-b37b-33c29fb43cc2.png) It's incredible it's already here! This is great news for everyone in free open-source artificial intelligence. Llama 2 unleashes Meta's (previously) closed model (Llama) to become free open-source AI, accelerating access and development for large language models (LLMs). This marks a significant step in machine learning and deep learning technologies. With this move, a widely supported LLM can become a viable choice for businesses, developers, and entrepreneurs to innovate our future using a model that the community has been eagerly awaiting since its [initial leak earlier this year](https://www.theverge.com/2023/3/8/23629362/meta-ai-language-model-llama-leak-online-misuse). - [Meta Announcement](https://ai.meta.com/llama/) - [Meta Overview](https://ai.meta.com/resources/models-and-libraries/llama/) - [Github](https://github.com/facebookresearch/llama/tree/main) - [Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) **Here are some highlights from the [official Meta AI announcement](https://ai.meta.com/llama/):** ## **Llama 2** >In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. > >Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs. >Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1. Its fine-tuned models have been trained on over 1 million human annotations. ## **Inside the Model** - [Technical details](https://ai.meta.com/resources/models-and-libraries/llama/) ### With each model download you'll receive: - Model code - Model Weights - README (User Guide) - Responsible Use Guide - License - Acceptable Use Policy - Model Card ## **Benchmarks** >Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. It was pretrained on publicly available online data sources. The fine-tuned model, Llama-2-chat, leverages publicly available instruction datasets and over 1 million human annotations. ![](https://lemmy.world/pictrs/image/6ea0eaf2-2bd1-44b3-ae9e-7e5868283f51.png) ## **RLHF & Training** >Llama-2-chat uses reinforcement learning from human feedback to ensure safety and helpfulness. Training Llama-2-chat: Llama 2 is pretrained using publicly available online data. An initial version of Llama-2-chat is then created through the use of supervised fine-tuning. Next, Llama-2-chat is iteratively refined using Reinforcement Learning from Human Feedback (RLHF), which includes rejection sampling and proximal policy optimization (PPO). ![](https://lemmy.world/pictrs/image/3a0c7364-8733-4404-b1cd-313e617dc604.png) ## **The License** >Our model and weights are licensed for both researchers and commercial entities, upholding the principles of openness. Our mission is to empower individuals, and industry through this opportunity, while fostering an environment of discovery and ethical AI advancements. >**Partnerships** >We have a broad range of supporters around the world who believe in our open approach to today’s AI — companies that have given early feedback and are excited to build with Llama 2, cloud providers that will include the model as part of their offering to customers, researchers committed to doing research with the model, and people across tech, academia, and policy who see the benefits of Llama and an open platform as we do. ## **The/CUT** With the release of Llama 2, Meta has opened up new possibilities for the development and application of large language models. This free open-source AI not only accelerates access but also allows for greater innovation in the field. **Take Three**: - **Video Game Analogy**: Just like getting a powerful, rare (or previously banned) item drop in a game, Llama 2's release gives developers a powerful tool they can use and customize for their unique quests in the world of AI. - **Cooking Analogy**: Imagine if a world-class chef decided to share their secret recipe with everyone. That's Llama 2, a secret recipe now open for all to use, adapt, and improve upon in the kitchen of AI development. - **Construction Analogy**: Llama 2 is like a top-grade construction tool now available to all builders. It opens up new possibilities for constructing advanced AI structures that were previously hard to achieve. ## **Links** Here are the key resources discussed in this post: - [Meta Announcement](https://ai.meta.com/llama/) - [Meta Overview](https://ai.meta.com/resources/models-and-libraries/llama/) - [Github](https://github.com/facebookresearch/llama/tree/main) - [Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) - [Technical details](https://ai.meta.com/resources/models-and-libraries/llama/) Want to get started with free open-source artificial intelligence, but don't know where to begin? Try starting here: - [FOSAI Welcome Message](https://lemmy.world/post/67758) - [FOSAI Crash Course](https://lemmy.world/post/76020) - [FOSAI Nexus Resource Hub](https://lemmy.world/post/814816) If you found anything else about this post interesting - consider subscribing to !fosai@lemmy.world where I do my best to keep you in the know about the most important updates in free open-source artificial intelligence. This particular announcement is exciting to me because it may popularize open-source principles and practices for other enterprises and corporations to follow. We should see some interesting models emerge out of Llama 2. I for one am looking forward to seeing where this will take us next. Get ready for another wave of innovation! This one is going to be big.
fedilink

Introducing Llama 2 - Meta’s Next-Generation Commercially Viable Open-Source AI & LLM
cross-posted from: https://lemmy.world/post/1750098 > ## **[Introducing Llama 2 - Meta's Next Generation Free Open-Source Artificially Intelligent Large Language Model](https://ai.meta.com/llama/)** > > ![Llama 2](https://lemmy.world/pictrs/image/00096be3-c55f-40e8-b37b-33c29fb43cc2.png) > > It's incredible it's already here! This is great news for everyone in free open-source artificial intelligence. > > Llama 2 unleashes Meta's (previously) closed model (Llama) to become free open-source AI, accelerating access and development for large language models (LLMs). > > This marks a significant step in machine learning and deep learning technologies. With this move, a widely supported LLM can become a viable choice for businesses, developers, and entrepreneurs to innovate our future using a model that the community has been eagerly awaiting since its [initial leak earlier this year](https://www.theverge.com/2023/3/8/23629362/meta-ai-language-model-llama-leak-online-misuse). > > - [Meta Announcement](https://ai.meta.com/llama/) > - [Meta Overview](https://ai.meta.com/resources/models-and-libraries/llama/) > - [Github](https://github.com/facebookresearch/llama/tree/main) > - [Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) > > **Here are some highlights from the [official Meta AI announcement](https://ai.meta.com/llama/):** > > ## **Llama 2** > > >In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. > > > >Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs. > > >Llama 2 pretrained models are trained on 2 trillion tokens, and have double the context length than Llama 1. Its fine-tuned models have been trained on over 1 million human annotations. > > ## **Inside the Model** > > - [Technical details](https://ai.meta.com/resources/models-and-libraries/llama/) > > ### With each model download you'll receive: > > - Model code > - Model Weights > - README (User Guide) > - Responsible Use Guide > - License > - Acceptable Use Policy > - Model Card > > ## **Benchmarks** > > >Llama 2 outperforms other open source language models on many external benchmarks, including reasoning, coding, proficiency, and knowledge tests. It was pretrained on publicly available online data sources. The fine-tuned model, Llama-2-chat, leverages publicly available instruction datasets and over 1 million human annotations. > > ![](https://lemmy.world/pictrs/image/6ea0eaf2-2bd1-44b3-ae9e-7e5868283f51.png) > > ## **RLHF & Training** > > >Llama-2-chat uses reinforcement learning from human feedback to ensure safety and helpfulness. Training Llama-2-chat: Llama 2 is pretrained using publicly available online data. An initial version of Llama-2-chat is then created through the use of supervised fine-tuning. Next, Llama-2-chat is iteratively refined using Reinforcement Learning from Human Feedback (RLHF), which includes rejection sampling and proximal policy optimization (PPO). > > ![](https://lemmy.world/pictrs/image/3a0c7364-8733-4404-b1cd-313e617dc604.png) > > ## **The License** > > >Our model and weights are licensed for both researchers and commercial entities, upholding the principles of openness. Our mission is to empower individuals, and industry through this opportunity, while fostering an environment of discovery and ethical AI advancements. > > >**Partnerships** > > >We have a broad range of supporters around the world who believe in our open approach to today’s AI — companies that have given early feedback and are excited to build with Llama 2, cloud providers that will include the model as part of their offering to customers, researchers committed to doing research with the model, and people across tech, academia, and policy who see the benefits of Llama and an open platform as we do. > > ## **The/CUT** > > With the release of Llama 2, Meta has opened up new possibilities for the development and application of large language models. This free open-source AI not only accelerates access but also allows for greater innovation in the field. > > **Take Three**: > > - **Video Game Analogy**: Just like getting a powerful, rare (or previously banned) item drop in a game, Llama 2's release gives developers a powerful tool they can use and customize for their unique quests in the world of AI. > - **Cooking Analogy**: Imagine if a world-class chef decided to share their secret recipe with everyone. That's Llama 2, a secret recipe now open for all to use, adapt, and improve upon in the kitchen of AI development. > - **Construction Analogy**: Llama 2 is like a top-grade construction tool now available to all builders. It opens up new possibilities for constructing advanced AI structures that were previously hard to achieve. > > ## **Links** > > Here are the key resources discussed in this post: > > - [Meta Announcement](https://ai.meta.com/llama/) > - [Meta Overview](https://ai.meta.com/resources/models-and-libraries/llama/) > - [Github](https://github.com/facebookresearch/llama/tree/main) > - [Paper](https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/) > - [Technical details](https://ai.meta.com/resources/models-and-libraries/llama/) > > Want to get started with free open-source artificial intelligence, but don't know where to begin? > > Try starting here: > > - [FOSAI Welcome Message](https://lemmy.world/post/67758) > - [FOSAI Crash Course](https://lemmy.world/post/76020) > - [FOSAI Nexus Resource Hub](https://lemmy.world/post/814816) > > If you found anything else about this post interesting - consider subscribing to !fosai@lemmy.world where I do my best to keep you in the know about the most important updates in free open-source artificial intelligence. > > This particular announcement is exciting to me because it may popularize open-source principles and practices for other enterprises and corporations to follow. > > We should see some interesting models emerge out of Llama 2. I for one am looking forward to seeing where this will take us next. Get ready for another wave of innovation! This one is going to be big.
fedilink

New AI/LLM Breakthrough - FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
cross-posted from: https://lemmy.world/post/1709025 > # [FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness](https://arxiv.org/abs/2205.14135) > > [FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning](https://crfm.stanford.edu/2023/07/17/flash2.html) > > ![](https://lemmy.world/pictrs/image/b09c2e17-c5a8-4f21-8b37-1faaf9e12c62.png) > > Today, we explore an exciting new development: FlashAttention-2, a breakthrough in Transformer model scaling and performance. The attention layer, a key part of Transformer model architecture, has been a bottleneck in scaling to longer sequences due to its runtime and memory requirements. FlashAttention-2 tackles this issue by improving work partitioning and parallelism, leading to significant speedups and improved efficiency for many AI/LLMs. > > The significance of this development is huge. Transformers are fundamental to many current machine learning models, used in a wide array of applications from language modeling to image understanding and audio, video, and code generation. By making attention algorithms IO-aware and improving work partitioning, FlashAttention-2 gets closer to the efficiency of General Matrix to Matrix Multiplication (GEMM) operations, which are highly optimized for modern GPUs. This enables the training of larger and more complex models, pushing the boundaries of what's possible with machine learning both at home and in the lab. > > ## Features & Advancements > > FlashAttention-2 improves upon its predecessor by tweaking the algorithm to reduce the number of non-matrix multiplication FLOPs, parallelizing the attention computation, and distributing work within each thread block. These improvements lead to approximately 2x speedup compared to FlashAttention, reaching up to 73% of the theoretical maximum FLOPs/s. > > Relevant resources: > - [Github](https://github.com/Dao-AILab/flash-attention) > - [Blog post](https://crfm.stanford.edu/2023/07/17/flash2.html) > > ## Installation & Requirements > > To install FlashAttention-2, you'll need CUDA 11.4 and PyTorch 1.12 or above. The installation process is straightforward and can be done through pip or by compiling from source. Detailed instructions are provided on the Github page. > > Relevant resources: > - [Github](https://github.com/Dao-AILab/flash-attention) > - [Blog post](https://crfm.stanford.edu/2023/07/17/flash2.html) > > ## Supported Hardware & Datatypes > > FlashAttention-2 currently supports Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon. It supports datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs). All head dimensions up to 256 are supported. > > Relevant resources: > - [Github](https://github.com/Dao-AILab/flash-attention) > - [Blog post](https://crfm.stanford.edu/2023/07/17/flash2.html) > > ## The/CUT > > FlashAttention-2 is a significant leap forward in Transformer model scaling. By improving the efficiency of the attention layer, it allows for faster and more efficient training of larger models. This opens up new possibilities in machine learning applications, especially in systems or projects that need all the performance they can get. > > **Take Three**: > Three big takeaways from this post: > > - **Performance Boost**: FlashAttention-2 is a significant improvement in Transformer architecture and provides a massive performance boost to AI/LLM models who utilize it. It manages to achieve a 2x speedup compared to its predecessor, FlashAttention. This allows for faster training of larger and more complex models, which can lead to breakthroughs in various machine learning applications at home (and in the lab). > > - **Efficiency and Scalability**: FlashAttention-2 improves the efficiency of attention computation in Transformers by optimizing work partitioning and parallelism. This allows the model to scale to longer sequence lengths, increasing its applicability in tasks that require understanding of larger context, such as language modeling, high-resolution image understanding, and code, audio, and video generation. > > - **Better Utilization of Hardware Resources**: FlashAttention-2 is designed to be IO-aware, taking into account the reads and writes between different levels of GPU memory. This leads to better utilization of hardware resources, getting closer to the efficiency of optimized matrix-multiply (GEMM) operations. It currently supports Ampere, Ada, or Hopper GPUs and is planning to extend support for Turing GPUs soon. This ensures that a wider range of machine learning practitioners and researchers can take advantage of this breakthrough. > > ## Links > - [FlashAttention Paper](https://arxiv.org/abs/2205.14135) > - [FlashAttention Post](https://crfm.stanford.edu/2023/07/17/flash2.html) > - [Github Repository](https://github.com/Dao-AILab/flash-attention) If you found anything about this post interesting - consider subscribing to !fosai@lemmy.world where I do my best to keep you informed in free open-source artificial intelligence. Thank you for reading!
fedilink

Petals - Run large language models at home, BitTorrent‑style
cross-posted from: https://lemmy.world/post/1535627 > **[I'd like to share with you Petals: decentralized inference and finetuning of large language models](https://petals.ml/)** > - https://petals.ml/ > - https://research.yandex.com/blog/petals-decentralized-inference-and-finetuning-of-large-language-models > > **[What is Petals?](https://research.yandex.com/blog/petals-decentralized-inference-and-finetuning-of-large-language-models)** > > >Run large language models at home, BitTorrent‑style > > >Run large language models like LLaMA-65B, BLOOM-176B, or BLOOMZ-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning. > >Single-batch inference runs at 5-6 steps/sec for LLaMA-65B and ≈ 1 step/sec for BLOOM — up to 10x faster than offloading, enough for chatbots and other interactive apps. Parallel inference reaches hundreds of tokens/sec. > >Beyond classic language model APIs — you can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states. You get the comforts of an API with the flexibility of PyTorch. > > - [Colab Link](https://colab.research.google.com/drive/1uCphNY7gfAUkdDrTx21dZZwCOUDCMPw8?usp=sharing) > - [GitHub Docs](https://github.com/bigscience-workshop/petals) > > >**Overview of the Approach** > > >On a surface level, Petals works as a decentralized pipeline designed for fast inference of neural networks. It splits any given model into several blocks (or layers) that are hosted on different servers. These servers can be spread out across continents, and anybody can connect their own GPU! In turn, users can connect to this network as a client and apply the model to their data. > When a client sends a request to the network, it is routed through a chain of servers that is built to minimize the total forward pass time. Upon joining the system, each server selects the most optimal set of blocks based on the current bottlenecks within the pipeline. Below, you can see an illustration of Petals for several servers and clients running different inputs for the model. > > ![](https://lemmy.world/pictrs/image/5d01f513-588f-43a5-9185-827217ab2ebb.png) > > >**Benchmarks** > > >We compare the performance of Petals with offloading, as it is the most popular method for using 100B+ models on local hardware. We test both single-batch inference as an interactive setting and parallel forward pass throughput for a batch processing scenario. Our experiments are run on BLOOM-176B and cover various network conditions, from a few high-speed nodes to real-world Internet links. As you can see from the table below, Petals is predictably slower than offloading in terms of throughput but 3–25x faster in terms of latency when compared in a realistic setup. This means that inference (and sometimes even finetuning) is much faster with Petals, despite the fact that we are using a distributed model instead of a local one. > > ![](https://lemmy.world/pictrs/image/dad697d6-38a7-4234-b8c8-cf62d667c561.png) > > >**Conclusion** > > >Our work on Petals continues the line of research towards making the latest advances in deep learning more accessible for everybody. With this work, we demonstrate that it is feasible not only to train large models with volunteer computing, but to run their inference in such a setup as well. The development of Petals is an ongoing effort: it is fully open-source (hosted at https://github.com/bigscience-workshop/petals), and we would be happy to receive any feedback or contributions regarding this project! > > - [You can read the full article here](https://research.yandex.com/blog/petals-decentralized-inference-and-finetuning-of-large-language-models)
fedilink

Mark Zuckerberg & Meta to Release Commercial Version of its AI/LLM (LLaMA) In Effort to Catch Rivals
cross-posted from: https://lemmy.world/post/1428161 > **[Mark Zuckerberg & Meta to Release Commercial Version of its AI/LLM (LLaMA) In Effort to Catch Rivals]()** > - https://archive.is/WS877 > > Hello everyone. I have some very exciting news to share with you today. Mark Zuckerberg & Meta are poised to release a commercial version of LLaMA in the near future. This is huge for us! > > The current generation of LLaMA is a non-commercial license for research use only. This means the large community behind it has been unable to utilize it in wider businesses applications of their own, hindering innovation. > > With this, Mark may open up a channel for another surge in open-source AI that accelerates us forward, (hopefully) ahead of our other mutual competitors of Google, OpenAI, etc. > > You should read the full article [here](https://archive.is/WS877), but I will leave you with highlights below. > > >Meta released its own language model to researchers and academics earlier this year, but the new version will be more widely available and customisable by companies © FT montage/Bloomberg/Dreamstime > > >Meta is poised to release a commercial version of its artificial intelligence model, allowing start-ups and businesses to build custom software on top of the technology. > > >The move will allow Meta to compete with Microsoft-backed OpenAI and Google, which are surging ahead in the race to develop generative AI. The software, which can create text, images and code, is powered by large language models (LLMs) that are trained on huge amounts of data and require vast computing power. > > >Meta released its own language model, known as LLaMA, to researchers and academics earlier this year, but the new version will be more widely available and customisable by companies, three people familiar with the plans said. The release is expected imminently, one of the people said. > > >Meta says its LLMs are “open-source”, by which it means details of the new model will be released publicly. This contrasts with the approach of competitors such as OpenAI, whose latest model GPT-4 is a so-called black box in which the data and code used to build the model are not available to third parties. > > >“The competitive landscape of AI is going to completely change in the coming months, in the coming weeks maybe, when there will be open source platforms that are actually as good as the ones that are not,” vice-president and chief AI scientist at Meta, Yann LeCun, said at a conference in Aix-en-Provence last Saturday. > > >Meta’s impending release comes as a race among Silicon Valley tech groups to establish themselves as dominant AI participants is heating up. > > **The/CUT** (TLDR) > > Mark Zuckerberg's Meta is set to shake up the AI race with the release of a commercially available version of their language model, LLaMA. The move is poised to democratize AI by making the model open-source and customizable by startups and businesses, fostering a broader community of innovation. As the AI landscape heats up with giants like OpenAI and Google, Meta's decision encourages an open view of the future, one in which collaboration with LLMs could redefine the bounds of technology. > > What do you think about this news? How do you feel about Zuck making LLaMA commercially available? Do you have any projects you plan to build with this knowledge? > > Only time will tell how honest he is in this release. For now, we'll have to be patient and see how it all unfolds. > > Best case scenario - it encourages other big tech companies to follow, accelerating our journey to SGI/AGI. Worst case scenario - we get a massive influx of cool new LLaMA models. > > It's a win for us either way! We take those. > > If you found any of this interesting, consider subscribing to !fosai@lemmy.world where I do my best to keep you in the know with the latest developments and breakthroughs in free open-source artificial intelligence. > > Thank you for reading! Check out some of these resources below if you want to learn more about this announcement (or get started with free, open-source AI). > > **Related Links** > - [Meta to release commercial AI model in effort to catch rivals](https://archive.is/WS877) > - [Mark Zuckerberg Hinting At This w/ Lex Fridman in a Recent Podcast](https://www.youtube.com/watch?v=Ff4fRgnuFgQ) > - [Free Open-Source AI Nexus of Resources](https://lemmy.world/post/814816) > - [Lemmy Crash Course to Free Open-Source AI](https://lemmy.world/post/76020)
fedilink

All of these are great thoughts and ponderings! Totally correct in the right circumstances, too.

Massive context lengths that can retain coherent memory and attention over long periods of time would enable all sorts of breakthroughs in LLM technology. At this point, you would be held back by performance, compute, and datasets, rather than LLM context windows and short-term memory. In this context, our focus would be towards optimizing attention or improving speed and accuracy.

Let’s say you had hundreds of pages of a digital journal and felt like feeding this to a local LLM (where your data stays private). If the model was running sufficiently at high quality, you could have an AI assistant, coach, partner, or tutor that was caught up to speed with your project’s goals, your personal aspirations, and your daily life within a matter of a few hours (or a few weeks, depending on hardware capabilities).

Missing areas of expertise you want your AI to have? Upload and feed it more datasets Matrix style, any text-based information that humanity has shared online is available to the model.

From here, you could further finetune and give your LLM a persona, having an assistant and personal operating system that breaks down your life with you, or you could simply ‘chat’ with your life, those pages you fed it, and reflect upon your thoughts and memories, tuned to a super intelligence beyond your own.

Poses some fascinating questions, doesn’t it? About consciousness? Thought? You? This is the sort of stuff that keeps me up at night… If you trained a private LLM on your own notes, thoughts, reflections and introspection, wouldn’t you be imposing a level of consciousness into a system far beyond your own mental capacities? I have already started to use LLMs on the daily. In the right conditions, I would absolutely utilize a tool like this. We’re not at super intelligence yet, but an unlimited context window for a model of that caliber would be groundbreaking.

Information of any kind could be digitalized and formatted into datasets (at massive lengths), enabling this assistant or personal database to grow overtime with innovations of a project, you, your life, learning and discovering things alongside the intention and desire for it to function. At that point, we’re starting to get into augmented human capabilities.

What this means over the course of many years and breakthroughs in models and training methods would be fascinating thought experiment to consider for a society where everyone is using massive context length LLMs regularly.

Sci-fi is quickly becoming a reality, how exciting! I’m here for it, that’s for sure. Let’s hope the technology stays free, and open and accessible for all of us to participate in its marvels.


Microsoft Announces: LongNet - Scaling LLM Transformers to 1,000,000,000 Tokens & Context Length
cross-posted from: https://lemmy.world/post/1115513 > **[Microsoft Announces a New Breakthrough](https://arxiv.org/pdf/2307.02486.pdf): [LongNet](https://github.com/microsoft/unilm/tree/master): Scaling AI/LLM Transformers to 1,000,000,000 Tokens & Context Length** > > Official Microsoft Breakthroughs: > - https://arxiv.org/pdf/2307.02486.pdf > - https://github.com/microsoft/unilm/tree/master > > See one of the first implementations of LongNet here: > - https://github.com/kyegomez/LongNet > > In the realm of large language models, scaling sequence length has emerged as a significant challenge. Current methods often grapple with computational complexity or model expressivity, limiting the maximum sequence length. This paper introduces LongNet, a Transformer variant designed to scale sequence length to over 1 billion tokens without compromising performance on shorter sequences. The key innovation is dilated attention, which exponentially expands the attentive field as the distance increases. > > **Features** > > LongNet offers several compelling advantages: > > - Linear Computation Complexity: It maintains a linear computational complexity and a logarithmic dependency between tokens. > - Distributed Trainer: LongNet can serve as a distributed trainer for extremely long sequences. > - Dilated Attention: This new feature is a drop-in replacement for standard attention and can be seamlessly integrated with existing Transformer-based optimization. > - (+ many others that are hard to fit here - [please read the full paper here](https://arxiv.org/pdf/2307.02486.pdf) for more insights) > > Experimental results show that LongNet delivers strong performance on both long-sequence modeling and general language tasks. This work paves the way for modeling very long sequences, such as treating an entire corpus or even the whole Internet as a sequence. > > If computation and inference hurdles are continually overcome the way they are now - we may be seeing near infinite context lengths sooner than many had initially thought. How exciting! > > **Arxiv Paper | The Abstract:** > > ![](https://lemmy.world/pictrs/image/0ede375d-3328-4dac-aaab-35adb3dfcbda.png) > > (take this graph with a grain of salt - this is not indicative of logarithmic scaling) > > >Scaling sequence length has become a critical demand in the era of large language > models. However, existing methods struggle with either computational complexity > or model expressivity, rendering the maximum sequence length restricted. In this > work, we introduce LONGNET, a Transformer variant that can scale sequence > length to more than 1 billion tokens, without sacrificing the performance on shorter > sequences. Specifically, we propose dilated attention, which expands the attentive > field exponentially as the distance grows. LONGNET has significant advantages: > > >1) It has a linear computation complexity and a logarithm dependency between tokens. > > >2) It can be served as a distributed trainer for extremely long sequences. > > >3) Its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization. > > >Experiments results demonstrate that LONGNET yields strong performance on both long-sequence modeling and general language tasks. > > >Our work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence. Code is available at https://aka.ms/LongNet. > > ![](https://lemmy.world/pictrs/image/d5759505-1abf-4923-b839-aa6a050441d8.png) > > [Click here to read the rest of the paper](https://arxiv.org/pdf/2307.02486.pdf)!
fedilink

News: OpenAI Introduces Superalignment
cross-posted from: https://lemmy.world/post/1102882 > **On 07/05/23, OpenAI Has Announced a New Initiative:** > > **[Superalignment](https://openai.com/blog/introducing-superalignment)** > > - https://openai.com/blog/introducing-superalignment > > Here are a few notes from their article, which you should [read in its entirety](https://openai.com/blog/introducing-superalignment). > > >**Introducing Superalignment** > > > We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the compute we’ve secured to date to this effort. We’re looking for excellent ML researchers and engineers to join us. > > >Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction. > > >While superintelligence seems far off now, we believe it could arrive this decade. > > >Here we focus on superintelligence rather than AGI to stress a much higher capability level. We have a lot of uncertainty over the speed of development of the technology over the next few years, so we choose to aim for the more difficult target to align a much more capable system. > > >Managing these risks will require, among other things, new institutions for governance and solving the problem of superintelligence alignment: > > >How do we ensure AI systems much smarter than humans follow human intent? > > >Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs. > > >Other assumptions could also break down in the future, like favorable generalization properties during deployment or our models’ inability to successfully detect and undermine supervision during training. > > >**Our approach** > > >Our goal is to build a roughly human-level automated alignment researcher. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence. > > >To align the first automated alignment researcher, we will need to 1) develop a scalable training method, 2) validate the resulting model, and 3) stress test our entire alignment pipeline: > > >- 1.) To provide a training signal on tasks that are difficult for humans to evaluate, we can leverage AI systems to assist evaluation of other AI systems (scalable oversight). In addition, we want to understand and control how our models generalize our oversight to tasks we can’t supervise (generalization). > > >- 2.) To validate the alignment of our systems, we automate search for problematic behavior (robustness) and problematic internals (automated interpretability). > > >- 3.) Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing). > > >We expect our research priorities will evolve substantially as we learn more about the problem and we’ll likely add entirely new research areas. We are planning to share more on our roadmap in the future. > > >**The new team** > > >We are assembling a team of top machine learning researchers and engineers to work on this problem. > > >We are dedicating 20% of the compute we’ve secured to date over the next four years to solving the problem of superintelligence alignment. Our chief basic research bet is our new Superalignment team, but getting this right is critical to achieve our mission and we expect many teams to contribute, from developing new methods to scaling them up to deployment. > > [Click Here Read More](https://openai.com/blog/introducing-superalignment). > > I believe this is an important notch in the timeline to AGI and Synthetic Superintelligence. I find it very interesting OpenAI is ready to admit the proximity of breakthroughs we are quickly encroaching as a species. I hope we can all benefit from this bright future together. > > If you found any of this interesting, please consider subscribing to [/c/FOSAI](https://lemmy.world/c/fosai)! > > Thank you for reading!
fedilink

That’s okay! I hope you find what you’re looking for. If not, I’m sure someone will create a community for you soon. There’s a lot of new users migrating, only a matter of time before more content starts filling up the empty spaces!


If you’re interested in free, open-source artificial intelligence news, breakthroughs, and developments - you should head over and subscribe to /c/FOSAI. I’d love to have you! Say hi anytime. I do my best to avoid spam, sensationalism, and clickbait.