title: AI Intelligence Training category: Intelligence tags: ai, training, intelligence, machine-learning, models priority: Normal
AI Intelligence Training
IdentityCenter's Intelligence engine can be trained on your specific environment to improve the accuracy of anomaly detection, peer group analysis, and contextual insights. This article covers the training process, what data is used, how it improves your results, and configuration options.
Why Train the AI?
Out of the box, IdentityCenter applies general-purpose analysis rules -- standard thresholds for inactivity, common patterns for privileged access risk, and default peer groupings. While these defaults are effective for many environments, training the AI on your specific data enables it to:
| Capability | Before Training | After Training |
|---|---|---|
| Anomaly Detection | Flags based on generic thresholds | Flags based on your environment's actual baselines |
| Peer Group Analysis | Groups users by department/title only | Groups users by observed access patterns and behavior |
| False Positive Rate | Higher -- generic rules do not account for your unique patterns | Lower -- the model understands what is "normal" in your environment |
| Risk Scoring Accuracy | Reasonable estimates based on standard factors | Calibrated scores reflecting your organization's actual risk profile |
| Insight Relevance | General security advice | Environment-specific recommendations |
What Data Is Used for Training
The training process analyzes data that has already been synced into IdentityCenter. No external data sources are required, and no data is sent outside your environment.
Data Categories
| Data Category | Examples | Used For |
|---|---|---|
| Directory Objects | Users, computers, groups, OUs, contacts, service accounts | Understanding your identity landscape |
| Login Timestamps | lastLogon, lastLogonTimestamp | Establishing baseline activity patterns |
| Group Memberships | Direct and nested memberships | Mapping access patterns and peer relationships |
| Organizational Structure | Department, title, manager, location, division | Building peer groups and detecting org anomalies |
| Account Attributes | UAC flags, password policies, delegation settings | Learning your security configuration norms |
| Historical Changes | Audit log entries, attribute change history | Understanding change velocity and patterns |
Data Not Used
The following data is explicitly excluded from the training process:
- Password hashes or password content
- Authentication tokens or credentials
- Personal data not relevant to access patterns (e.g., home addresses, personal phone numbers)
- Data from disconnected or disabled source connections
The Training Process
Step 1: Initiate Training
Navigate to the Intelligence Center in IdentityCenter and select AI Intelligence Training. The training module is accessible to users with the Administrator role.
Before starting training, the system validates that:
- At least one source connection is configured and has completed a sync
- A minimum number of objects are present (recommended: 100+ user objects)
- The LLM provider is configured (see Configuring the LLM Provider)
Click Start Training to begin the process.
Step 2: Data Analysis Phase
During this phase, the IntelligenceDataRepository collects and preprocesses the training data:
| Activity | Description | Duration |
|---|---|---|
| Object Census | Counts and categorizes all directory objects | Seconds |
| Pattern Extraction | Identifies common access patterns, group structures, and organizational hierarchies | 1-5 minutes |
| Baseline Computation | Calculates baseline metrics for login frequency, group membership counts, and access levels per peer group | 2-10 minutes |
| Anomaly Calibration | Determines appropriate thresholds for anomaly detection based on your data's distribution | 1-5 minutes |
Step 3: Model Calibration
The system uses the analyzed data to calibrate its detection models:
- Peer group boundaries are refined based on observed clustering in access patterns
- Inactivity thresholds are adjusted based on your environment's login distribution
- Risk score weights are fine-tuned to reflect the actual risk factors present in your data
- Anomaly sensitivity is calibrated to minimize false positives while catching genuine outliers
Step 4: Completion
When training completes, the Intelligence Center displays:
- A summary of what was learned (number of peer groups identified, baseline metrics established)
- The estimated improvement in detection accuracy
- The timestamp of the training completion
- A recommendation for when to retrain
Progress Tracking
During training, a progress indicator shows:
| Phase | Progress Range | Description |
|---|---|---|
| Initializing | 0-5% | Validating prerequisites and preparing data pipeline |
| Collecting Data | 5-25% | Querying and assembling training data from the data store |
| Analyzing Patterns | 25-60% | Running pattern extraction and baseline computation |
| Calibrating Models | 60-90% | Adjusting detection thresholds and peer group definitions |
| Finalizing | 90-100% | Persisting results and updating the active models |
You can continue using IdentityCenter normally while training runs in the background. Training does not lock any features or block sync operations.
How Training Improves Insights
More Accurate Anomaly Detection
Before training, a user with 45 group memberships might be flagged simply because 45 exceeds a generic threshold. After training, the system knows that engineers in your organization typically have 30-50 group memberships, so this user is within normal range -- while a marketing user with 45 memberships would still be flagged as an anomaly.
Better Peer Group Analysis
Training refines peer groups beyond simple department and title matching. The system may discover that:
- Users in "Engineering - Platform" and "Engineering - Infrastructure" share access patterns even though they are in different sub-departments
- Regional variations exist (e.g., EMEA users have different typical access than US users)
- Certain job titles in your organization carry different access expectations than industry norms
Reduced False Positives
The most immediate benefit of training is a reduction in false positive alerts and findings. By understanding what is normal in your environment, the Intelligence engine stops flagging expected patterns and focuses on genuine anomalies.
| Metric | Before Training | After Training |
|---|---|---|
| Anomaly alerts per day | ~120 | ~35 |
| False positive rate | ~60% | ~15% |
| Mean time to investigate | 12 minutes per alert | 8 minutes per alert |
| Actionable finding rate | ~40% | ~85% |
(Values are illustrative and will vary based on environment size and data quality.)
When to Retrain
The AI model should be retrained periodically and in response to significant changes:
| Trigger | Reason |
|---|---|
| Quarterly schedule | Regular recalibration ensures the model stays current as your environment evolves |
| Major organizational restructure | Department changes, mergers, or reorganizations change what "normal" looks like |
| New source connection added | A new directory source introduces objects the model has not seen |
| Significant headcount change | Large onboarding or offboarding events shift baselines |
| After policy changes | New security policies may change expected access patterns |
| Rising false positive rate | If alert noise increases, the model may need recalibration |
Tip: Set a calendar reminder for quarterly retraining. The process is non-disruptive and typically completes within 15-30 minutes for environments with up to 50,000 objects.
Privacy and Data Security
All AI training in IdentityCenter is performed locally. No identity data, directory attributes, or training results are transmitted to external services.
| Concern | How It Is Addressed |
|---|---|
| Data residency | All training data stays within your IdentityCenter database |
| External API calls | The LLM is called for insight generation (not training). API calls contain structured prompts, not raw directory data |
| Data retention | Training results are stored in the IntelligenceDataRepository within your database |
| Access control | Only administrators can initiate training or view training results |
| Audit trail | Training initiation and completion are logged in the audit system |
Configuring the LLM Provider
The LLM provider is configured in the ChatAI Settings page, accessible from Administration > AI Settings.
Supported Providers
| Provider | Model | Configuration |
|---|---|---|
| Anthropic | Claude | API key, model selection |
Configuration Steps
- Navigate to Administration > AI Settings (ChatAISettings.razor)
- Select Anthropic as the LLM provider
- Enter your Anthropic API key
- Select the Claude model variant to use
- Click Save and Test Connection to verify
Important: The API key is stored encrypted in the IdentityCenter database. It is used only for LLM inference calls (insight generation, chat responses) and is not transmitted to any other service.
API Usage
The LLM API is used for:
- Generating narrative insight text from structured analyzer findings
- Processing natural language chat messages in ChatHub
- Creating executive briefing summaries
The LLM is not used for the core training process (pattern analysis, baseline computation, model calibration). Those operations are performed locally using statistical analysis.
Monitoring Training Effectiveness
After training, monitor these metrics in the Intelligence Center to assess effectiveness:
| Metric | Where to Find It | Good Trend |
|---|---|---|
| False Positive Rate | Intelligence Dashboard | Decreasing |
| Alert Volume | Intelligence Dashboard | Decreasing (fewer noise alerts) |
| Actionable Finding Rate | Intelligence Dashboard | Increasing |
| Peer Group Coverage | Training Results Summary | >90% of users assigned to a peer group |
| Baseline Stability | Training Results Summary | Baselines converge (low variance between training runs) |
Next Steps
- Risk Scoring -- Understand how trained models improve risk score accuracy
- Contextual Insights -- See how training enhances per-object analysis
- Intelligence Hub Overview -- Full overview of the analytics platform
- Using the AI Chat -- Interact with the trained AI through ChatHub
- Natural Language Queries -- How the LLM processes your questions
- Dashboard and Reporting -- Track Intelligence metrics over time
- Security Hardening -- Secure your AI configuration