Back to Intelligence
Intelligence

AI Intelligence Training

27 views

title: AI Intelligence Training category: Intelligence tags: ai, training, intelligence, machine-learning, models priority: Normal

AI Intelligence Training

IdentityCenter's Intelligence engine can be trained on your specific environment to improve the accuracy of anomaly detection, peer group analysis, and contextual insights. This article covers the training process, what data is used, how it improves your results, and configuration options.

Why Train the AI?

Out of the box, IdentityCenter applies general-purpose analysis rules -- standard thresholds for inactivity, common patterns for privileged access risk, and default peer groupings. While these defaults are effective for many environments, training the AI on your specific data enables it to:

Capability Before Training After Training
Anomaly Detection Flags based on generic thresholds Flags based on your environment's actual baselines
Peer Group Analysis Groups users by department/title only Groups users by observed access patterns and behavior
False Positive Rate Higher -- generic rules do not account for your unique patterns Lower -- the model understands what is "normal" in your environment
Risk Scoring Accuracy Reasonable estimates based on standard factors Calibrated scores reflecting your organization's actual risk profile
Insight Relevance General security advice Environment-specific recommendations

What Data Is Used for Training

The training process analyzes data that has already been synced into IdentityCenter. No external data sources are required, and no data is sent outside your environment.

Data Categories

Data Category Examples Used For
Directory Objects Users, computers, groups, OUs, contacts, service accounts Understanding your identity landscape
Login Timestamps lastLogon, lastLogonTimestamp Establishing baseline activity patterns
Group Memberships Direct and nested memberships Mapping access patterns and peer relationships
Organizational Structure Department, title, manager, location, division Building peer groups and detecting org anomalies
Account Attributes UAC flags, password policies, delegation settings Learning your security configuration norms
Historical Changes Audit log entries, attribute change history Understanding change velocity and patterns

Data Not Used

The following data is explicitly excluded from the training process:

  • Password hashes or password content
  • Authentication tokens or credentials
  • Personal data not relevant to access patterns (e.g., home addresses, personal phone numbers)
  • Data from disconnected or disabled source connections

The Training Process

Step 1: Initiate Training

Navigate to the Intelligence Center in IdentityCenter and select AI Intelligence Training. The training module is accessible to users with the Administrator role.

Before starting training, the system validates that:

  • At least one source connection is configured and has completed a sync
  • A minimum number of objects are present (recommended: 100+ user objects)
  • The LLM provider is configured (see Configuring the LLM Provider)

Click Start Training to begin the process.

Step 2: Data Analysis Phase

During this phase, the IntelligenceDataRepository collects and preprocesses the training data:

Activity Description Duration
Object Census Counts and categorizes all directory objects Seconds
Pattern Extraction Identifies common access patterns, group structures, and organizational hierarchies 1-5 minutes
Baseline Computation Calculates baseline metrics for login frequency, group membership counts, and access levels per peer group 2-10 minutes
Anomaly Calibration Determines appropriate thresholds for anomaly detection based on your data's distribution 1-5 minutes

Step 3: Model Calibration

The system uses the analyzed data to calibrate its detection models:

  • Peer group boundaries are refined based on observed clustering in access patterns
  • Inactivity thresholds are adjusted based on your environment's login distribution
  • Risk score weights are fine-tuned to reflect the actual risk factors present in your data
  • Anomaly sensitivity is calibrated to minimize false positives while catching genuine outliers

Step 4: Completion

When training completes, the Intelligence Center displays:

  • A summary of what was learned (number of peer groups identified, baseline metrics established)
  • The estimated improvement in detection accuracy
  • The timestamp of the training completion
  • A recommendation for when to retrain

Progress Tracking

During training, a progress indicator shows:

Phase Progress Range Description
Initializing 0-5% Validating prerequisites and preparing data pipeline
Collecting Data 5-25% Querying and assembling training data from the data store
Analyzing Patterns 25-60% Running pattern extraction and baseline computation
Calibrating Models 60-90% Adjusting detection thresholds and peer group definitions
Finalizing 90-100% Persisting results and updating the active models

You can continue using IdentityCenter normally while training runs in the background. Training does not lock any features or block sync operations.

How Training Improves Insights

More Accurate Anomaly Detection

Before training, a user with 45 group memberships might be flagged simply because 45 exceeds a generic threshold. After training, the system knows that engineers in your organization typically have 30-50 group memberships, so this user is within normal range -- while a marketing user with 45 memberships would still be flagged as an anomaly.

Better Peer Group Analysis

Training refines peer groups beyond simple department and title matching. The system may discover that:

  • Users in "Engineering - Platform" and "Engineering - Infrastructure" share access patterns even though they are in different sub-departments
  • Regional variations exist (e.g., EMEA users have different typical access than US users)
  • Certain job titles in your organization carry different access expectations than industry norms

Reduced False Positives

The most immediate benefit of training is a reduction in false positive alerts and findings. By understanding what is normal in your environment, the Intelligence engine stops flagging expected patterns and focuses on genuine anomalies.

Metric Before Training After Training
Anomaly alerts per day ~120 ~35
False positive rate ~60% ~15%
Mean time to investigate 12 minutes per alert 8 minutes per alert
Actionable finding rate ~40% ~85%

(Values are illustrative and will vary based on environment size and data quality.)

When to Retrain

The AI model should be retrained periodically and in response to significant changes:

Trigger Reason
Quarterly schedule Regular recalibration ensures the model stays current as your environment evolves
Major organizational restructure Department changes, mergers, or reorganizations change what "normal" looks like
New source connection added A new directory source introduces objects the model has not seen
Significant headcount change Large onboarding or offboarding events shift baselines
After policy changes New security policies may change expected access patterns
Rising false positive rate If alert noise increases, the model may need recalibration

Tip: Set a calendar reminder for quarterly retraining. The process is non-disruptive and typically completes within 15-30 minutes for environments with up to 50,000 objects.

Privacy and Data Security

All AI training in IdentityCenter is performed locally. No identity data, directory attributes, or training results are transmitted to external services.

Concern How It Is Addressed
Data residency All training data stays within your IdentityCenter database
External API calls The LLM is called for insight generation (not training). API calls contain structured prompts, not raw directory data
Data retention Training results are stored in the IntelligenceDataRepository within your database
Access control Only administrators can initiate training or view training results
Audit trail Training initiation and completion are logged in the audit system

Configuring the LLM Provider

The LLM provider is configured in the ChatAI Settings page, accessible from Administration > AI Settings.

Supported Providers

Provider Model Configuration
Anthropic Claude API key, model selection

Configuration Steps

  1. Navigate to Administration > AI Settings (ChatAISettings.razor)
  2. Select Anthropic as the LLM provider
  3. Enter your Anthropic API key
  4. Select the Claude model variant to use
  5. Click Save and Test Connection to verify

Important: The API key is stored encrypted in the IdentityCenter database. It is used only for LLM inference calls (insight generation, chat responses) and is not transmitted to any other service.

API Usage

The LLM API is used for:

  • Generating narrative insight text from structured analyzer findings
  • Processing natural language chat messages in ChatHub
  • Creating executive briefing summaries

The LLM is not used for the core training process (pattern analysis, baseline computation, model calibration). Those operations are performed locally using statistical analysis.

Monitoring Training Effectiveness

After training, monitor these metrics in the Intelligence Center to assess effectiveness:

Metric Where to Find It Good Trend
False Positive Rate Intelligence Dashboard Decreasing
Alert Volume Intelligence Dashboard Decreasing (fewer noise alerts)
Actionable Finding Rate Intelligence Dashboard Increasing
Peer Group Coverage Training Results Summary >90% of users assigned to a peer group
Baseline Stability Training Results Summary Baselines converge (low variance between training runs)

Next Steps

Tags: ai training intelligence machine-learning models

Was this article helpful?

Related Articles

Intelligence Hub Overview
Contextual Insights
Risk Scoring