Retrieving Active Directory Data into BI

Active Directory is used by most organizations to both manage users, groups and devices that are registered to the network and to maintain better network security solutions. This information can be extracted and made available to analyze authorization on different applications within your organization.

Below are few key details that can be retrieved from Active Directory:

UserID, Login Name
Employee’s First Name, Last Name
Employee Status
Job Title
User Mail ID
Employee Organization
Line Manager Details
Account Status
Member Groups

 

1. Motivation

In most business cases, HR systems will hold only information of users under the organization’s payroll, and not that of any contractors or external vendors working with the organization. This calls for extracting Active Directory data into the enterprise’s data warehouse and using it for further analysis in scenarios where we need to get the full list of employees (including direct employees, contractors, vendors etc.) along with their details, i.e, who will be accessing internal applications like service manager, BI services, general services applications, etc. If we can pull this information into OBIEE and automate the data load process, it enables us to generate audit reports and to analyse the access rights of users to various applications.

1.1. ADManager Plus data extraction and challenges 

Manage Engine ADManager Plus is a web-based interface solution designed to meet Active Directory management requirements and report generation. It helps to perform audits for the defined security permissions for a specific AD object or for a specific user. The Report scheduler feature within enables the auto-generating of reports at specified times and delivers the report to multiple users via email in the desired format.

Although AD manager plus supports generating and scheduling reports, it can only be delivered to email accounts. Because of this limitation, automating the data extraction process from Active Directory and loading the data into a data warehouse is not possible.

2. Python solution for Automating Data retrieval and the loading process

Data Extraction from LDAP could be achieved using the LDAP library interface module for Python. Here we use ldap3, which runs off of pure, vanilla Python. ldap3 is supported in Python versions 2.6 and above.

The Ldap3 module needs to be installed in order to establish a connection to the server and retrieve data. It can be installed either using a pip package or by downloading the latest version from LDAP3 source location and install it using command – Python steup.py manually.  You should refer to Ldap3 library documentation for further references.

2.1. Defining server details and establishing connection to the LDAP server

In order to establish a connection, the first server object needs to be defined. The Server object specifies the DSA (Directory Server Agent) LDAP server that will be used by the connection. The user then needs to define a host variable, which will be host name/IP/complete URL with hostname, host port of LDAP server. This is required to create the server object.

# import class and constants 
from ldap3 import Server, Connection, ALL

#define the server
hostname = 'servename'  # hostname for LDAP server 
server = Server (hostname, get_info=ALL)

The connection object will send operation requests to the LDAP server. It takes different parameters like server, username and password for performing operations in the server. The connection object also requires authentication type and read only parameters that define the type of operation to be performed on the server.

# define the conncetion 
uername = 'xxxxx@clearpeaks.com'  #user account to access server 
password = '*********'  # password for authentication
connection = Connection (server, user=username, password=password, auto_bind=True, collect_usage=True)
connection.bind() 

Once the connection object is defined, the bind() method is executed to open the connection with the server. The bind operation allows credentials to be exchanged between the client – server and establishes a new authorization state. Connection once established enables to perform all the standard LDAP operations.

2.2. Accessing AD information and generating files

The Search method in the connection object enables search operations on the LDAP database. It takes the following parameters:

Search Base: takes the base of the search request
Search Filter: takes the filter to be applied on the search request
Search Scope: defines how broad the search context is. BASE, LEVEL, SUBTREE are the values
Attributes: a single attribute or a list of attributes which can be returned by the search
Get Operational_Attributes: if True, returns information attributes

Search method takes many other parameters like time limit, size limit etc. You may refer to the Ldap3 documentation for further reference. By default, Active Directory sets a hard limit of 1000 entries returned for any search, mainly due to security constraints. It is better to go for a paged search method when retrieving AD, as the entries to be retrieved will be more than 1000 records in many cases.

Instead of a simple search operation, here we are doing a paged search to retrieve the full list of entries from the AD server.

# define parameters and calling search method
base = 'dc=clearpeaks, dc=corp, dc=ae'  # defining case for search operation 
filter_value = 'All BI Users Group'  # defining filters for search operation 
attrs = ['SAMAccountName','Title','dislpayname','department']  #defining required attributes from AD

data = c.extend.standard.paged_search(search_base) = base_dn, search_filter = ' (memberOf = '+allBIusers+')',
search_scope = SUBTREE,
attributes = attrs, 
get_operational_attributes = True
paged_size = 15, generation = True)

Responses received from the Search operation will be in a list format and stored in the Response attribute of the connection object.  The response object can be iterated over to retrieve the results.  A few data cleansing steps are required to carry it out, as Active Directory data text will be in UTF-8 format, necessitating its decoding into ASCII format. Once retrieved, the response data is updated to a CSV file, which can be saved at a shared location in the server.

Figure 4

Figure 1: Sample of the response data updated to a CSV file

Files generated through Python scripts can be accessed by ETL Mapping, which is configured to extract data from excel as source and loads it into warehouse tables, where your data is ready to be visualized in any BI tool. Below is a sample report using LDAP data.

Figure

Figure 2: Sample of Active Directory Data displayed in BI Report

Unix Crontab command can be used to schedule the script, based on the frequency required to refresh data from the Active Directory. ETLs are further scheduled respectively to load new sets of data from the generated source file.

Conclusion

In this blog post, we describe a solution to automate the Active Directory Data retrieval and loading process into BI environments using Python. Active Directory information can be utilized for analyzing Security Privileges granted within an organization, which in turn helps to audit user access to various applications.

Although we explained only the scope of reading Active Directory data, it is not the only benefit of this method. Once an authenticated connection is established with the server, all standard LDAP operations can be performed using the LDAP3 Python library.

Contact us if you want to have more information about how pull HR information into OBIEE.

 

Shahin A
shahin.ali@clearpeaks.com