Subjective AI/ML Digest: April II

Last two weeks in AI and ML

Boris Tseitlin

Apr 23, 2023

AI

Generative Agents: Interactive Simulacra of Human Behavior

Authors use language models to create a believable The Sims-like simulation of a small town
Three modules:
- Memory stream: keeps record of agent experiences, retrieval is used to get memories into context
- Reflection: LLM used to transform observations from memory stream to higher level concepts (summaries). Triggered when the memory stream grows too large.
- Planning: LLM is used to make plans. LLM is prompted based on memory stream observations to update plans.
- ChatGPT used as the LLM of choice
Creepy AI vibes
Shows how far you can go with just clever prompt engineering

ML

LLMS

MiniGPT-4

Quite small but effective free LLM with image-text capabilities
Released weights
Aligns a frozen visual encoder from BLIP-2 with a frozen LLM, Vicuna (13B), using just one projection layer
Model can answer questions about images, read handwriting, and generate rap songs inspired by images.

Stability AI Launches the First of its StableLM Suite of Language Models

New open-source LLM, 3B to 7B parameters
CC BY-SA-4.0 license
New dataset built on top The Pile, but three times larger. Not released to the public yet.

Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM

Dolly 2.0: instruction-following ChatGPT-like model by Databricks
Available on Hugginface
Full open source, the license allows commercial use
Based on EleutherAI pythia, 12B parameters
Dataset of instructions databricks-dolly-15k also open source. Larger than OpenAI’s InstructGPT dataset (13k).

DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

Quote: A single script capable of taking a pre-trained Huggingface model, running it through all three steps of InstructGPT training using DeepSpeed-RLHF system and producing your very own ChatGPT like model
Basically an easy-to-use implementation of the InstructGPT pipeline.
Heavily optimized engine for inference of ChatGPT-like models.

Amazon CodeWhisperer

Free Copilot alternative

Amazon Bedrock

Seems to be Hugginface integrated into AWS

Experimenting with LLMs to Research, Reflect, and Plan

The author makes bots with ChatGPT to summarize URLs, make SQL queries, search, imitate famous people, Q&A based on his own writing.
Used LangChain, Railway for hosting, Pinecone for context vector storage.

Notable papers

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

New work on text-to-video generation by Nvidia.
Most impressive result in this area so far.

Internet Explorer: Targeted Representation Learning on the Open Web

It is a way to pre-train a CLIP-like model for mapping text to images. The main gain is efficiency.
“Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset”
Model samples text queries from a prior concept distribution (could be a GPT model), searches for images and downloads the top 100, updates prior concepts, and self-supervised learning on downloaded images using contrastive loss.
Model learns to make better queries over time.
Authors train a ResNet-50 in this way. Beats the usual CLIP ResNet-50 trained on datasets Birdsnap, Flowers, Pets and fMoW while using only 2.5% as much compute and 0.5% as much data.

Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions

TLDR ChatGPT is bad at playing text games and building a world model
Authors make ChatGPT play Zork. They probe the ability of the model to build a map of the game world.

DINOv2: Learning Robust Visual Features without Supervision

Basically DINO with more data, bigger scale, and optimizations.

Segment Anything

SAM: zero-shot model to segment multiple objects on images.
Interactive: point at entity to segment it, bounding box, mask or text description
Image encoder is a ViT trained via MAE.
Trained on 11m images and 1.1b masks dataset. Available for download.

SiLK -- Simple Learned Keypoints

One-stage model to find keypoints
Trained in a self-supervised manner

Thoughts, news and the rest

Google DeepMind: Bringing together two world-class AI teams

On AutoGPT

Continuous doesn’t mean slow

Building LLM applications for production

Behind the curtain: what it feels like to work in AI right now

ChatGPT is the iPhone moment for AI.
“AI Safety is a real problem that is entering the discourse as a public problem.”
Being an ML influencer is easy, but pushing stuff forward is hard.
“… working in AI feels like the candle that burns bright and short. I'm oscillating between the most motivated I've ever been and some of the closest to burnt-out I've ever felt.”

Slowing down development of AI systems passing the Turing test

Yoshua Bengio on why he signed the “Pause Giant AI Experiments” letter
“We succeeded in regulating nuclear weapons on a global scale after World War II, we can reach a similar agreement for AI.”

Today’s special — scams and dangers

A couple in Canada were reportedly scammed out of $21,000 after getting a call from an AI-generated voice pretending to be their son

AI clones teen girl’s voice in $1M kidnapping scam: ‘I’ve got your daughter’

Iran installs cameras in public places to identify, penalise unveiled women

Boris Again

Discussion about this post