1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Understanding SQL Server Operating System

Discussion in 'SQL Server' started by MinalS, Jun 10, 2015.

  1. MinalS

    MinalS New Member

    Jul 8, 2014
    Likes Received:
    Trophy Points:
    SQLOS Architecture

    The SQLOS is the server application used for managing the operating system resources. The buffer management, scheduling, deadlock detection, exception handling, extended events, etc are managed. The operating system is called and the scheduling functions efficiently schedule the engine resources.

    It provides an API to different layers when required by the operating resources. The API helps user to write code irrespective of machine architecture.


    Functions of SQLOS
    1. Exception handling: It handles both system and user defined errors.
    2. Deadlock: It detects deadlocks and checks for tasks that hold resources and locks.
    3. Synchronization: It contains reader/writer locks on resources, mutual exclusions
    4. Memory brokers: The memory is distributed among the components in the SQL Server.
    5. Asynchronous I/O: The server requests the windows for initiating I/O. The server session is placed in I/O wait queue. Once it receives a signal from about the resource availability, it is resumed from the queue.
    6. CLR hosting: CLR inside SQLOS helps the managed code to be used in SQL Server. There are changed to memory reporting, garbage collection.
    SQL server scheduler

    The server creates a scheduler for a CPU when it is started. Every scheduler is set to ONLINE or OFFLINE depending on the affinity settings. The sp_configure_option replaces the affinity mask. The process affinity option can be modified without restarting the server.

    SQL Server tasks

    A request in SQL Server is a batch sent from client to server. Once it is received by the server, it is bound to the task, the complete process or task is handled by the server. Multiple child tasks can be created depending on the degree of parallelism. The task or worker cannot accept a new request if there is a block condition.

    SPID is connection through which the requests are sent to the server. Every SPID has a scheduler associated with it. The scheduler with low load is assigned with SPID. If subsequent request from sent using the similar ID, the scheduler with low load factor is assigned with it. A limitation to the tasks is the processing of the SPID is done on the same node.

    NUMA and schedulers

    In the NUMA configuration, each node is subset of machine processors. The number of processor is preset, if the machine is NUMA configured. The SPID is assigned to the round robin to nodes. The scheduler monitor assigns the SPID to the least load scheduler.

    Every memory node contains its lazywriter. The I/O completion port is the network listener. Node consists of resource monitor used for managing the scheduler. The scheduler monitor thread starts at a specific time interval and checks to yield the time previously the monitor was awake.

    Binding schedulers

    Ideally, the schedulers do not bind to the CPU. Once the process affinity is set, the schedulers are bound to the CPU. The value AUTO for the process affinity states that hard binding of the scheduler is not needed.

    The ALTER CONFIGURATION SET PROCESS AFFINITY CPU needs only CPUID. The range of ID using the start and end of the ID is specified. The process affinity is dependent on NUMA nodes using the ALTER SERVER CONFIGURATION SET PROCESS AFFINITY NUMANODE = < NUMA node range spec>.


    The resources of the memory are managed dynamically by the server. The SQLOS layer communicates constantly with the operating system. The buffer pool is main component of the memory. The buffer handles disk I/O functions and index pages in the cache.

    The size of buffer is similar to the index page. The buffer is referred as a page frame for holding a database page. The plan cache is used as a cache for query plans and procedures. If there are large memory blocks to be allocated, the multi page allocator is used.

    The column store object pool allocates memory from page allocator as a buffer pool. The CACHE_COLUMNSTOREOBJECTPOOL is used to track the memory. The memory is different from the buffer pool. The page flushing can occur from the data cache.

    Page management

    Performance optimization can be achieved by keeping the buffers ready. The non availability of the buffer results into unwanted search to locate the buffer. To overcome this issue, the server has a linked list of addresses of free pages. The buffer contains header with information about the page reference. The information implements the page replacement policy.

    The lazy writer process checks for the data cache. The buffer cache is present in the memory, they do not need I/O. The value is assigned to the buffer. It indicated the last access time. Once the value is reached to low, the dirty page indicator is checked. A write operation is used to commit the changes to the disk.


    The checkpoint is useful for checking the buffer cache. The dirty data pages are written to particular database to disk. The pages are combined for optimizing the I/O. The buffer is not on free list through checkpoint. The checkpoint checks the pages are modified and written to disk. The dirty pages are kept to minimum resulting in less recovery time for the server at failure.

    The server adds a record to the transaction log containing the active transactions list. The recovery process builds table with list of dirty pages. They are present at intervals, requested manually. The sp_configure recovery interval option is used for checking frequency. The recovery time is balanced. The extended events sqlserver.checkpoint_begin and sqlserver.checkpoint_end are used for monitoring the activity.

    SQL Server Resource Governor

    The server governor is used for monitoring the resource consumption for every workload. The workloads are executed on priority. The runaway queries are reduced.
    The functionality of the governor is dependent on the workloads and resource pools set up by the DBA. The workload groups can be set using the DDL commands.

    Classifier function

    Once the function is defined, the governor is enabled. It is applied to new session for the workload group assigned with the session. The session is present in the group till it is explicitly removed. One function can be active at a time. It is based on connection and checks the group dependent on the system functions.

    Resource Pools

    The pool is a subset of the physical resources. Every pool has two parts:
    1. The minimum value can be set to the resource as the values do not overlap with pools.
    2. The other part overlaps with the pool, the maximum resource consumption is possible
    The MIN value represents the minimum resource available, the MAX value states the maximum pool size.
    1. MIN_CPU_PERCENT: The server distributes bandwidth in individual requests and IMPORTANCE property for the workload group. The default value is 0.
    2. MAX_CPU_PERCENT: The maximum CPU bandwidth for all requests present in the pool.
    3. AFFINITY: The resource pool is attached to the specific scheduler. The AUTO value helps all the schedulers in the pool to be used.
    4. CAP_CPU_PERCENT: Limiting the maximum CPU bandwidth is difficult. The pool cannot exceed the CAP_CPU_PERCENT value.
    5. MIN_MEMORY_PERCENT: The memory reserved for the pool that cannot be shared with other pools.
    6. MAX_MEMORY_PERCENT: The percent of total memory for all requests in the specific pool for use. The value is always greater than or equal to MIN_MEMORY_PERCENT.
    Workload groups

    The workload group is defined by DBA for multiple connections sharing the same resources. There are two workload groups as mentioned:
    1. Internal group: The internal activities are performed by the server. The sessions cannot be added to the group. The monitoring of the group is possible.
    2. Default group: Every session is classified into a group when classifier rules cannot be applied.
    There are six properties for specifying the workload group. The list of properties is as mentioned:
    1. GROUP_MAX_REQUESTS: It is maximum number of requests simultaneously executing on the group.
    2. REQUEST_MAX_MEMORY_GRANT_PERCENT: The maximum amount of memory for a single task from the group can take from the pool.
    3. REQUEST_MAX_CPU_TIME_SEC: The maximum amount of CPU time in seconds needed by the request to consume in the group.
    4. REQUEST_MEMORY_GRANT_TIMEOUT_SEC: The maximum time in seconds the query needs to wait for the resource availability.
    5. IMPORTANCE: Every group has values low, medium, and high in the resource pool. The default value is medium.
    6. MAX_DROP: The maximum degree of parallelism for a parallel query is specified.
    7. GROUP_MAX_REQUESTS: The maximum number of requests allowed simultaneously for workload group execution.
    Extended Events

    The extended events have diagnostic collection of data for solving problems in SQL Trace. The event engine starts inside the server, provides functionality for extended events. The environment provides package metadata, event sessions, pool of dispatcher threads, etc.

    Inside the server there are individual modules that load packages into extended events engine. The packages are metadata containers providing information about targets, events, actions, maps, types, etc.

    Life cycle

    A Boolean check is performed on the task to check whether the event is enabled in an event session. If the event is not enabled for the session, the code execution for task continues. If event is enabled, the data columns and the corresponding column schema are collected.

    The active sessions in the engine having event enabled for collection are checked. The predicate evaluation is used for determining the event is executing. If the Boolean expression for the predicate evaluation is false, the event terminates and the engine returns to the original task. If the value is true, the data columns are copied into session context and the execution continues.

    The actions added to the event are executed in engine. Once the action is completed, the event data is dispatched to synchronous targets defined for the event session. The data is added to the session memory buffer for dispatching to the asynchronous targets using the dispatcher pool.
    state and shabbir like this.

Share This Page