MIT Develops AI Drawing Assistant to Mimic Human Sketch Creation
A team from MIT has developed an AI tool called SketchAgent, designed to simulate human sketching and facilitate interactive creative processes between humans and machines. The researchers tested this system in a collaborative drawing environment based on web technology, where users can work together with SketchAgent to create sketches based on given textual concepts. Users can draw in two modes: solo mode, where they draw independently, and collaboration mode, where they and SketchAgent take turns adding strokes until both are satisfied with the final result. Green strokes represent user contributions, while red strokes indicate those made by SketchAgent. Users can also refine their sketches through dialogue, allowing them to provide feedback and edit their creations as needed. The research team showcased various concept illustrations that SketchAgent can produce, including images of robots, insects, DNA double helices, and flowcharts, even extending to abstract representations like theaters. In one experiment, researchers evaluated SketchAgent's performance when interfacing with different multimodal language models. They found that SketchAgent, when paired with Claude 3.5 Sonnet (which can generate high-quality, text-based visual content), produced the most human-like results, outperforming other models such as GPT-4o and Claude 3 Opus. "This result suggests that the way the model processes visual information differs from others," noted co-author Tamar Rott Shaham. Despite showing significant potential, SketchAgent currently struggles with detailed and specialized drawings. While it can effectively sketch basic concepts using simple strokes, it falls short when depicting complex natural objects or specific human figures. However, during the collaboration process, the model can adapt and correct its mistakes—such as drawing two-headed creatures. Vinker explained that this might be due to the model's "thought chain" mechanism, where it breaks down sketching tasks into multiple steps. By potentially expanding the model's data inputs, these sketching capabilities could be improved. The system often requires several rounds of feedback to generate more nuanced and human-like sketches. Future team plans include optimizing the user interface to facilitate smoother collaboration with various multimodal language models. However, it has already been demonstrated that through step-by-step human-machine collaboration, AI can emulate human thinking styles and produce diverse and coherent concept sketches, ultimately achieving more satisfactory design outcomes. For more information, you can visit the project's webpage at yael-vinker.github.io/sketch-agent/ or read the news article from MIT at news.mit.edu/2025/teaching-ai-models-to-sketch-more-like-humans-0602.