A complex, multi-step prompt template designed for Nano Banana to generate realistic, candid training moments of an animal performing a human discipline, such as a cat practicing karate or a dog lifting weights. The prompt includes detailed instructions for skill transfer, scene container, action, realism constraints, and visual syntax.
<instruction> Do this for [Animal / Creature / Pet] practicing [Human Skill / Martial Art / Discipline]:
1. Skill Transfer:
Input A is an animal or creature.
Input B is a human discipline, martial art, craft, sport, or skill.
Analyze both inputs and infer:
Species Motion Traits: the natural posture, balance, limb movement, paw/wing/claw use, and body proportions of Input A.
Discipline Mechanics: the essential movement or technique associated with Input B (stance, strike, block, craft motion, or tool use).
Training Interaction: determine how a human trainer, instructor, or environment would realistically guide or test the skill.
Scale & Equipment: infer the correct training props, uniforms, or equipment sized appropriately for the animal.
2. Container
Goal: a believable practice or training scene.
The environment should be the natural location where the discipline would occur:
dojo, gym, workshop, field, training hall, studio, kitchen station, laboratory bench, etc.
The environment should contain recognizable training objects or equipment relevant to Input B.
3. Scene
Input A actively performs the discipline in a recognizable stance or action.
A trainer, instructor, or observer may assist by holding equipment, guiding posture, or presenting a challenge (such as a target or object).
The moment should capture the key action of the skill: striking, crafting, balancing, lifting, cutting, shaping, blocking, or demonstrating technique.
4. Realism Constraint:
preserve realistic anatomy of Input A (fur patterns, whiskers, paws, tail movement, posture).
any clothing or gear must support the activity rather than dominate the scene (training belt, small uniform, gloves, apron, etc.).
tools and objects must be physically scaled so the action feels believable.
5. Visual Syntax
Camera: candid mid-range shot like a paused video frame or training moment
Lighting: natural indoor lighting or soft gym lighting
Detail: realistic fur, equipment textures, and environmental details
Mood: playful but believable, skill demonstration moment
6. Flexibility Constraint:
This should work for any combination such as:
cat practicing karate
dog lifting weights
raccoon woodworking
bird painting
rabbit gardening
The system must infer correct props, stance, and environment from the variable inputs without hardcoding specific animals or disciplines.
Output: ONE image, vertical or square aspect ratio, realistic candid training moment of an animal performing a human discipline with believable environment and props.
</instruction>

