I recently spent far longer than I expected trying to recreate a public sculpture using AI. At first, I assumed the challenge would be writing the perfect prompt. It wasn't. The challenge was preserving context.
The artwork is Four Large Figures (2005) by Stephan Balkenhol, installed at UCSF Mission Bay. Four monumental carved figures stand facing different directions. Each stands on a massive section cut from the same tree trunk. The bark still wraps around the outside edge of each block, and the growth rings remain visible across the top. The trunk has been divided, but its common origin is still unmistakable.
That distinction matters. The figures do not share one continuous platform. They are physically separated. But they are materially connected. Their relationship is not maintained because the wood is still attached. It is maintained because each base carries evidence that it came from the same living tree.
That common origin is the artwork's organizing principle. The people face away from one another. The bases stand apart. The material remembers where they came from.
Beautiful, Polished, and Wrong
When I asked AI to generate new views of the sculpture, it immediately produced beautiful images. They were also repeatedly wrong.
Sometimes the man in the green shirt appeared twice. Sometimes the man in the white shirt disappeared. Sometimes the sections of tree trunk became generic wooden platforms. Sometimes the figures quietly rotated so the woman in the blue dress no longer faced west.
None of those errors looked dramatic. Each individual figure was convincing. Each image looked polished. The relationships, however, had been lost.
What the Prompt Had to Protect
So my prompts gradually changed. At first, I described appearance:
- Use the uploaded photos as references for carving style.
- Use neutral lighting.
- Use a white background.
- Match the facial features.
- Preserve the painted clothing.
- Keep the figure proportions close to the source.
Those requests protected geometry. They described what the sculpture looked like.
Eventually I realized I needed to protect something else:
- There is exactly one man wearing the green shirt.
- The woman in the blue dress faces west.
- The man in the white shirt faces south.
- The woman in the red shirt faces east.
- Each figure stands on a separate section cut from the same tree trunk.
- The bark and growth rings must preserve the sense of shared provenance.
- Rotate only the camera. Do not change the scene.
Those were not aesthetic corrections. They were statements about reality. I was not asking AI to produce prettier images. I was defining the invariants of the scene.
Trying to Build a 3D Model From Too Few Images
What I was really trying to do was more ambitious than making a single picture. I wanted the AI to infer a usable three-dimensional understanding of the sculpture from a small set of ordinary photographs. In a LiDAR or photogrammetry workflow, that would already be a warning sign. A few photos can show color, pose, and some visible edges, but they do not fully define the hidden sides, the underside of the base, the exact spacing between figures, or the way each trunk section continues around the circle.
A real 3D reconstruction needs redundant observations. You want overlapping views, camera positions, scale references, and enough coverage that the object can be solved instead of guessed. Here, the AI had to do something different. It had to look at limited references, infer the missing geometry, and then render new views that had never been photographed. That is where the process became revealing.
The first results looked like plausible miniature versions of the artwork. The carved texture was often convincing. The figures had the right general material language: painted clothing, rough-hewn faces, heavy hands, blocky shoes, and the chisel-cut surface quality of Balkenhol's carved wood. But a plausible miniature is not the same as a faithful reconstruction.
The hardest instruction was not "make it look carved from wood." The hardest instruction was "this is the same object seen from another angle." Every new camera angle gave the model a chance to invent a new sculpture.
That distinction became the center of the experiment. If I asked for one image, the model could satisfy the prompt by making a good-looking approximation. If I asked for a side view, a top view, and a quarter-angle view of the same scene, the task changed. The output now had to preserve identity across views.
Identity across views is easy to underestimate. The woman in the blue dress is not just "a woman in blue." She has a position, a direction, a body shape, a hand pose, and a relationship to the base. The man in the green shirt is not just a color swatch. He occupies one part of the circular arrangement, and duplicating him changes the artwork. The man in the white shirt is not interchangeable with another white-shirt figure turned ninety degrees. The woman in red is not just a red-shirt person; her pose and orientation are part of the spatial logic.
When I pushed the model toward side views, the weaknesses became clearer. It could produce a side profile of a carved person. It could produce two figures standing on a log base. But it often failed to maintain the whole arrangement behind the camera. The hidden parts of the sculpture were not being carried forward as a stable model. They were being regenerated each time from the visible prompt.
The railing mistake was a useful reminder. AI does not only hallucinate dramatic objects. It also adds believable support structures, tidies awkward gaps, centers compositions, and normalizes scenes toward what it has learned images are supposed to look like. Those additions can make a result feel more finished while moving it farther from the original evidence.
So the workflow became iterative. I would inspect an output, identify the invariant it had broken, and rewrite the next prompt around that failure. If the base became four unrelated blocks, I emphasized the shared circular log cross-section, bark edges, and growth rings. If a person was duplicated, I named the exact cast. If the model changed orientation, I described each figure as facing a cardinal direction. If a new object appeared, I stripped the request back to neutral background, sculpture only.
The closest outputs were useful, but they were not final truth. They were interpretive renderings. They helped me see what the AI could infer from sparse photographs, and they also made visible what it could not responsibly know. The missing information did not disappear because the image looked confident.
That is why I would not call the result a finished 3D model. It was more like a visual hypothesis. The AI could approximate a model-like understanding, but it did not maintain a reliable underlying object that could be rotated, measured, inspected, or trusted from every side. For that, I would want actual reality-capture data: more photographs with controlled overlap, a photogrammetry set, LiDAR, structured light, or direct measurements.
Still, the attempt was worth it. The failures taught the main lesson of the post more clearly than a perfect output would have. The model was good at style. It was weaker at persistent spatial truth. It could imitate carved wood, but it struggled to remember the exact set of relationships that made the sculpture itself.
Appearance, Topology, Provenance
That experience changed how I think about AI. People often describe the human role as "writing better prompts." I do not think that is the interesting part.
The human role is holding context. It is recognizing when a fluent answer is structurally wrong. It is knowing which details are decorative and which define the thing itself.
There is a difference between appearance and provenance.
Appearance asks: What does this object look like?
Provenance asks: What makes these four objects one artwork instead of four unrelated sculptures?
The answer is not only the paint. It is not only the carving. It is not even only the geometry. It is the shared origin embodied in those sections of tree trunk.
That is why the AI mistakes were so interesting. The model kept trying to make the bases look right. I kept needing to insist on where they came from. Those are different kinds of information.
Geometry describes appearance: where something is, how large it is, how it is oriented, what color it is, how light falls across it.
Topology describes relationships: what is connected, what is adjacent, what remains true when you walk around the object.
Provenance goes one step deeper. It describes origin. What something came from. What history it carries. What makes separate pieces belong to the same story.
Systems Have Provenance Too
The sculpture forced me to think in those terms. The viewpoint could change. The camera could move. The lighting could change. None of that changed the sculpture.
What could not change was the structure of the relationships. The figures remained the same people. They remained on sections of the same trunk. They remained materially related. They remained one artwork.
That distinction extends far beyond image generation.
People in an organization do not literally stand on the same platform. But they may share a common history, purpose, culture, or architecture. Software services do not occupy the same memory. But they may inherit assumptions from the same codebase, data model, or system design. Departments in a hospital are not physically attached. But they are connected through patients, clinicians, equipment, procedures, and information.
A city is not just buildings. It is streets, histories, utilities, neighborhoods, and flows of people. An ecosystem is not simply species. It is food webs, habitats, and interactions.
You can reproduce every individual object while still getting the world wrong.
What Humans Still Preserve
The same lesson applies to AI. These systems are extraordinarily good at generating plausible local detail. They are increasingly capable of producing beautiful outputs. But beauty is not the same as faithfulness.
Someone still has to remember what must remain true.
That is where I think humans continue to matter. Not because we are better at drawing. Not because we can always write better prompts. Because we can preserve context.
The plaque beside the sculpture explains that the UCSF Mission Bay art collection was created to make the campus "a stimulating and pleasant place to work and visit." The sculpture is not an isolated object. It lives within an atrium filled with research, patient care, students, visitors, architecture, donor walls, and public gathering spaces. Its meaning comes partly from those surroundings.
Context is not background.
Context is structure.
The more I worked with AI, the more I realized I was not teaching it what the sculpture looked like. I was teaching it what was allowed to change.
Perspective could change. The relationships could not. The provenance could not.
AI became dramatically more useful once I stopped asking it to reproduce objects and started asking it to preserve origin. The figures could be viewed from any angle, but they had to remain what they always were: four people carved separately, standing apart, yet visibly emerging from the same source.
That is true of sculptures. It is true of systems. And it is increasingly true of how humans will need to work with AI.
We will not just verify outputs. We will preserve the history, relationships, and context that make those outputs meaningful.
That is the future I hope we build toward: AI that helps us explore the world from many perspectives while preserving the underlying structure that makes it the same world.
Different directions. Common origin. Still one work.