Foreword

This project aims to enable robots to effectively control their gaze by implementing a biologically inspired gaze control system. The approach is grounded in neuroscience research, particularly in the study of visual attention mechanisms. The foundational concept originates from computational models of visual attention, as introduced in the seminal work:

Itti, L., Koch, C., & Niebur, E. (2002). A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on pattern analysis and machine intelligence, 20(11), 1254-1259.

This paper proposed a model where attention is guided by a combination of bottom-up (stimulus-driven) and top-down (task-driven) processes. The idea that gaze selection emerges from the interplay of spontaneous exploration, salient stimuli, and task-specific objectives.

In this project, we implement a similar multi-layered system. Gaze targets are categorized into one of three layers:

Random: This layer represents spontaneous gaze movements that are not influenced by external stimuli or specific tasks. It mimics natural exploratory behavior, allowing the robot to scan its environment without a predefined objective.
Stimulus-based: This layer focuses on gaze targets that are driven by external stimuli. These targets are selected based on their saliency, such as bright colors, sudden movements, or other visually prominent features in the environment.
Task-driven: This layer prioritizes gaze targets that are directly related to the robot's current task or objective. These targets are determined by the specific requirements of the task, ensuring that the robot's gaze aligns with its operational goals.

Additionally, within each layer, gaze targets are assigned both a priority and a duration. The priority determines the importance of a target relative to others within the same layer, while the duration specifies how long the target remains active. A central component, the GazeScheduler, dynamically evaluates all available gaze targets and selects the one with the highest priority for execution. This ensures that the robot's gaze is always directed toward the most relevant target based on the current context.

Publications

No publications yet.

Terms and Concepts

Gaze targets

A gaze target consists of the following main properties:

Name: A string representing the name of the gaze target.
Position: A spatial position of the gaze target. It can both be specified in the global frame as well as relative to the robot or an object in the scene.
Priority: It defines the layer (random, stimulus-based, or task-driven) and priority within this layer (scalar value, the higher the more important the gaze target) of the gaze target
Duration: It specifies how long the gaze target should be active

Gaze scheduler

The GazeScheduler is the central component responsible for managing and prioritizing gaze targets. It receives gaze targets from multiple sources through the memory system, and dynamically schedules them based on a strict priority system.

The selected gaze target will then be sent to a robot-specific gaze controller.

Gaze controller

The robot-specific gaze controller receives a gaze target as either a global position or a position relative to a robot node. It then uses an Inverse-Kinematics-based approach to control the robot's gaze towards the target.

Gaze Control Strategies:

The following strategies are currently implemented for controlling the robot's gaze:

Atan2: This strategy calculates the pitch and yaw angles required for the robot's head to focus on a target. It is suitable for robots whose head kinematics align with these two joints, allowing direct usage.
GazeIK: A general-purpose solution that leverages inverse kinematics (IK) based on a virtual linkage model. For more details, refer to the implementation in Simox.
Hemisphere: An extension of the Atan2 strategy, specifically designed for a hemisphere joint configuration. In this setup, the two degrees of freedom control the pitch and roll of the robot's head. To maintain an upright head posture, roll is constrained to zero, and both degrees of freedom are controlled by the same value.

Structure of this Package

On the code level, this package contains the following libraries and components:

Libraries

Core libraries:

The common library with shared utilities and definitions.
The gaze_controller library for gaze control logic.
The gaze_scheduler library for dynamic gaze target prioritization.
The gaze_targets library for defining and managing gaze targets.
The interfaces library for interface definitions.

Generating gaze targets:

The target_provider library for providing gaze targets.

Client API to interface with GazeScheduler:

The client library for client-side functionality.

Access of functionality through skills:

The skills library for skill implementations.

Components

Core components

The component view_memory.
The component gaze_scheduler. It executes the gaze scheduling logic.

Gaze target providers

The component person_target_provider. It provides gaze targets related to detected persons.
The component handover_target_provider. It provides gaze targets for handover scenarios.

Skills

The component view_selection_skill_provider. It implements skills for view selection.

Examples

The component object_tracking_example. It demonstrates object tracking functionality.
The component scheduler_example. It showcases an example of gaze scheduling.

Skills

view_selection_skill_provider

Implemented in the library skills are the following skills, made available through the view_selection_skill_provider

Basic skills

Skill Name	Parameters	Description
LookAt	(ARON, header, ...)	Allow setting a gaze target.
SetCustomGazeTarget	(None)	Allows setting a custom gaze target. TODO: check how it differs `LookAt`
ResetGazeTargets	(None)	Removes all gaze targets from the `GazeScheduler`. Only the idle target (e.g., looking ahead) will remain.

Skills for predefined directions:

Skill Name	Parameters	Description
LookUp	(None)	Directs the robot's gaze upward.
LookDown	(None)	Directs the robot's gaze downward.
LookDownstraight	(None)	Directs the robot's gaze straight down.
LookLeft	(None)	Directs the robot's gaze to the left.
LookRight	(None)	Directs the robot's gaze to the right.
LookAhead	(None)	Directs the robot's gaze straight ahead.

Skills for scene exploration (looking around):

Skill Name	Parameters	Description
LookForObjects	(None)
LookForObjectsWithKinematicUnit	(None)	Searches for objects using the robot's kinematic unit. This skill is deprecated.
ScanLocation	(None)
ScanLocationsForObject	(None)

Focussing specific elements in the scene:

Skill Name	Parameters	Description
LookAtArticulatedObjectFrame	(None)	Directs the robot's gaze to a specific frame of an articulated object.
LookAtObject	(None)	Directs the robot's gaze to a specific object.
LookAtHumanFace	(None)	Directs the robot's gaze to a human's face.
LookAtHumanHand	(None)	Directs the robot's gaze to a human's hand.

Table of Contents