The Console & The Compiler

Advertisements

ADO.NET – Connected Layer

Intro

The connected layer provides functionality for running SQL syntax against a connected database. The statements can be select, insert, update or delete as well as execute schema statements and the ability to run stored procedures and functions.

The connected layer requires the database connection to remain open while the database transactions are being performed.

Source Code

All source code can be found on GitHub here.

This is part of my HowTo in .NET seriies. An overview can be seen here,

Command Objects

The Command object represents a SQL statement to be run; select, insert, update, delete, execute schema, execute stored procedure etc.

The DbCommand is configured to execute an sql statement or stored procedure via the CommandText property.

The CommandType defines the type of sql statement to be executed which is stored in the CommandText property.

CommandType

Description

StoredProcedure

CommandText contains the name of a StoredProcedure or UserFunction.

TableDirect

CommandText contains a table name. All rows and columns will be returned from the table.

Text

CommandText defines an SQL statement to be executed.

The CommandType will default to Text however it is good practice to explicitly set it.

var connectionDetails =
ConfigurationManager.ConnectionStrings ["MyDatabase"];

var providerName = connectionDetails.ProviderName;
var connectionString = connectionDetails.ConnectionString;

var dbFactory = DbProviderFactories.GetFactory (providerName);

using(var cn = dbFactory.CreateConnection())
{
cn.ConnectionString = connectionString;
cn.Open();

var cmd = cn.CreateCommand();

cmd.CommandText = "Select * from MyTable";
cmd.CommandType = CommandType.Text;
cmd.Connection = cn;
}

The command should be passed the connection via the Connection property.

The command object will not touch the database until the either the ExecuteReader, ExecuteNonQuery or equivalent has been called upon.

DataReader

The DataReader class allows a readonly iterative view of the data returned from a configured command object.

A DataReader is instantiated by calling ExecuteReader upon the command object.

The DataReader exposes a forward only iterator via the Read method which returns false when the collection has been completely iterated.

The DataReader represents both the collection of records and also the individual records; access to the data is made by calling methods upon the DataReader.

DataReader implements IDisposable and should be used within a using scope.

Data can be accessed by field name and also ordinal position via the index method[] which returns the data cast into object. Alternatively methods exist to get the data for any .NET primitive type.

using (var dr = cmd.ExecuteReader ()) {
while (dr.Read ()) {
var val1 = (string)dr ["FieldName"];
var val2 = (int)dr [0];

// Get the field name or its ordinal position
var name = dr.GetName (1);
var pos = dr.GetOrdinal ("FieldName");

// Strongly Types Data
var val3 = dr.GetInt32 (1);
var val4 = dr.GetDecimal (2);
var val5 = dr.GetDouble (3);
var val6 = dr.GetString (4);

var isNull = dr.IsDBNull (5);

var fdType = dr.GetProviderSpecificFieldType (6);
}
}

The IsDBNull method can be used to determine if a value contains a null. A .NET type representation of the contained field can be retrieved with the GetProviderSpecificFieldType method.

Multiple Results

If the select statement of a command has multiple results the DataReader.NextResult() method exposes a forward only iterator through each result group:

string strSQL = "Select * From MyTable;Select * from MyOtherTable";

NextResult increments to the next result group, just like Read it returns false once the collection has been exhausted:

using (var dr = cmd.ExecuteReader ()) {
while (dr.NextResult ()) {
while (dr.Read ()) {
}
}
}

DataSet

DataReader can be used to fill a DataSet. We talk more about DataSet within the disconnected layer in the next chapter.

var dtable = new DataTable();

using(var dr = cmd.ExecuteReader())
{
dtable.Load(dr);
}

ExecuteNonQuery

ExecuteNonQuery allows execution of an insert, update or delete statement as well as executing a schema statement.

The method returns an integral representing the number of affected rows.

using (var cn = dbFactory.CreateConnection ()) {
cn.ConnectionString = connectionString;
cn.Open ();

var cmd = cn.CreateCommand ();

cmd.CommandText = @"Insert Into MyTable (FieldA) Values ('Hello’)";
cmd.CommandType = CommandType.Text;
cmd.Connection = cn;

var count = cmd.ExecuteNonQuery ();
}

Command Parameters

Command parameters allow configuring stored procedure parameters as well as parameterized sql statements which can help protect against SQL injection attacks.

It is strongly advised that any information collected from a user should be sent to the database as parameter regardless if it is to be persisted or used within a predicate.

Parameters can be added with the Command.Parameters.AddWithValue or by instantiating a DbParameter class.

The example below shows the former.

using (var cn = dbFactory.CreateConnection ()) {
cn.ConnectionString = connectionString;
cn.Open ();

var cmd = cn.CreateCommand ();

cmd.CommandText = @"Insert Into MyTable (FieldA) Values (@Hello)";
cmd.CommandType = CommandType.Text;

var param = cmd.CreateParameter ();

param.ParameterName = "@Hello";
param.DbType = DbType.String;
param.Value = "Value";
param.Direction = ParameterDirection.Input;
cmd.Parameters.Add (param);

cmd.Connection = cn;

var count = cmd.ExecuteNonQuery ();
}

The DbCommand has a DbType property which allows setting the type of the parameter, it is vendor agnostic.

SqlCommand and MySqlCommand also provide SqlDbType and MySqlDbType which can be used to set the type field from a vendor specific enum. Setting the DbType will maintain the vendor specific column and vice versa.

Executing a Stored Procedure

A stored procedure is executed by configuring a DbCommand against the name of the stored procedure along with any required parameters followed by calling ExecuteScalar or ExecuteReader.

ExecuteScalar is used to return a single value. If multiple result sets, rows and columns are returned it will return the first column from the first row in the first result set.

Parameters can either be input or output.

using(var cn = dbFactory.CreateConnection())</span></pre>
{
cn.ConnectionString = connectionString;
cn.Open();

var cmd = cn.CreateCommand();
cmd.CommandText = "spGetFoo";
cmd.Connection = cn;

cmd.CommandType = CommandType.StoredProcedure;

// Input param.
var paramOne = cmd.CreateParameter();
paramOne.ParameterName = "@inParam";
paramOne.DbType = DbType.Int32;
paramOne.Value = 1;
paramOne.Direction = ParameterDirection.Input;
cmd.Parameters.Add(paramOne);

// Output param.
var paramTwo = cmd.CreateParameter();
paramTwo.ParameterName = "@outParam";
paramTwo.DbType = DbType.String;
paramTwo.Size = 10;
paramTwo.Direction = ParameterDirection.Output;
cmd.Parameters.Add(paramTwo);

// Execute the stored proc.
var count = cmd.ExecuteScalar();

// Return output param.
var outParam = (int)cmd.Parameters["@outParam"].Value;

// This can be made on the parameter directly
var outParam2 = (int)paramTwo.Value;
}

 

Member name

Description

Input

The parameter is an input parameter (default).

InputOutput

The parameter is capable of both input and output.

Output

The parameter is an output parameter and has be suffixed with the out keyword in the parameter list of a stored procedure, built in function or user defined function.

ReturnValue

The parameter is a return value, scalar or similar but not a return set. This is determined by the return keyword in a stored procedure, built in function or user defined function.

If the stored procedure returns a set and not a single value the ExecuteReader can be used to iterate over the result:

using (var cn = dbFactory.CreateConnection ()) {
cn.ConnectionString = connectionString;
cn.Open ();

var cmd = cn.CreateCommand ();
cmd.CommandText = "spGetFoo";
cmd.Connection = cn;

cmd.CommandType = CommandType.StoredProcedure;

using (var dr = cmd.ExecuteReader ()) {
while (dr.NextResult ()) {
while (dr.Read ()) {
}
}
}
}

Database Transactions

When the writable transaction of a database involves more than one writable actions, it is essential that all the actions are wrapped up in a unit of work called a database transaction.

ACID

The desired characteristics of a transaction are defined by the term ACID:

Member name

Description

Atomicity

All changes within a unit of work complete or none complete; they are atomic.

Consistency

The state of the data is consistent and valid. If upon completing all of the changes the data is considered invalid, all the changes are undone to return the data to the original state.

Isolation

All changes within a unit of work occur in isolation from other readable and writable transactions.

No one can see any changes until all changes have been completed in full.

Durability

Once all the changes within a unit of work have completed, all the changes will be persisted.

No one can see any changes until all changes have been completed in full.
Durability
Once all the changes within a unit of work have completed, all the changes will be persisted.

Syntax

A transaction is started through the connection object and attached to any command which should be run in the same transaction.

Commit should be called upon the transaction upon a successful completion while Rollback should be called if an error occured. This can be achieved by a try catch statement:

using (var cn = dbFactory.CreateConnection ()) {
cn.ConnectionString = connectionString;
cn.Open ();

var cmdOne = cn.CreateCommand ();
cmdOne.CommandText = "spUpdateFoo";
cmdOne.Connection = cn;
cmdOne.CommandType = CommandType.StoredProcedure;

var cmdTwo = cn.CreateCommand ();
cmdTwo.CommandText = "spUpdateMoo";
cmdTwo.Connection = cn;
cmdTwo.CommandType = CommandType.StoredProcedure;

var tran = cn.BeginTransaction ();

try {

cmdOne.Transaction = tran;
cmdTwo.Transaction = tran;

cmdOne.ExecuteNonQuery ();
cmdTwo.ExecuteNonQuery ();

tran.Commit ();
} catch (Exception ex) {
tran.Rollback ();
}
}

The DbTransaction class implements IDisposable and can therefore be used within a using statement rather than a try catch. The Dispose method will be called implicitly upon leaving the scope of the using statement; this will cause any uncommitted changes to be rolled back while allowing any committed changes to remain as persisted. This is the preferred syntax for writing transactions in ADO.NET:

var connectionDetails =
ConfigurationManager.ConnectionStrings ["MyDatabase"];

var providerName = connectionDetails.ProviderName;
var connectionString = connectionDetails.ConnectionString;

var dbFactory = DbProviderFactories.GetFactory (providerName);

using (var cn = dbFactory.CreateConnection ()) {
cn.ConnectionString = connectionString;
cn.Open ();

var cmdOne = cn.CreateCommand ();
cmdOne.CommandText = "spUpdateFoo";
cmdOne.Connection = cn;
cmdOne.CommandType = CommandType.StoredProcedure;

var cmdTwo = cn.CreateCommand ();
cmdTwo.CommandText = "spUpdateMoo";
cmdTwo.Connection = cn;
cmdTwo.CommandType = CommandType.StoredProcedure;

using (var tran = cn.BeginTransaction ()) {

cmdOne.Transaction = tran;
cmdTwo.Transaction = tran;

cmdOne.ExecuteNonQuery ();
cmdTwo.ExecuteNonQuery ();

tran.Commit ();
{
}
}
}

Concurrency Issues

Writing and reading to a database does suffer from concurrency issues; multiple transactions occuring at the same time.

Concurrency Issue

Description

Lost Update

Two or more transactions perform write actions on the same record at the same time without being aware of each others change to the data. The last transaction will persist its state of the data overwriting any changes to the same fields made by the first.

Dirty Read

Data is read while a transaction has started but not finished writing. The data is considered dirty as it represents a state of the data which should not have existed.

Nonrepeatable Read

A transaction reads a record multiple times and is presented with multiple versions of the same record due to another transaction writing to the same record.

Phantom Read

Data from a table is read while inserts or deletes are being made. The result set contains missing data from the inserts which have not finished as well as records which are no longer found within table due to the deletes.

Isolation Level

Isolation levels define rules for accesing data when another transaction is running.

The level is set when creating a transaction with the BeginTransaction method and can be read with the IsolationLevel property:

using (var tran = cn.BeginTransaction(IsolationLevel.ReadCommitted)) {

cmdOne.Transaction = tran;
cmdTwo.Transaction = tran;
IsolationLevel isolationLevel = tran.IsolationLevel;

cmdOne.ExecuteNonQuery ();
cmdTwo.ExecuteNonQuery ();

tran.Commit ();
}

The IsolationLevel enum has levels considered lowest as unspecified and highest as Serializable as we traverse through the table; opposite to the amount of concucrency allowed. Chaso and snapshop are considered outside of this low to high range.

 

Isolation Level

Description (MSDN)

Unspecified

The isolation level is undetermined.

A different isolation level than the one specified is being used, but the level cannot be determined.

Often found in older technologies which work differently from current standards such as OdbcTransaction,

ReadUncommitted

A transaction reading data places no shared or exclusive locks upon the data being read.

Other transactions are free to insert, delete and edit the data being read.

This level of isolation allows a high level of concurrency between transactions which is good for performance.

This level of isolation suffers from the possibility of dirty, nonrepeatable and phantom reads.

ReadCommitted

A transaction reading data places a shared lock on the data being read.

Other transactions are free to insert or delete rows but are not given permission to edit the data being read.

This level of isolation does not suffer from dirty reads.

This level of isolation suffers from non repeatable and phantom reads.

This is the default isolation in most databases.

RepeatableRead

A transaction reading data places an exclusive lock on the data being read.

Other transactions are free to insert data into the table but are not free to edit or delete the data being read.

This level of isolation does not suffer from dirty and nonrepeatable reads.

This level of isolation suffers from phantom reads.

Serializable

A transaction reading data places an exclusive lock on the data being read.

Other transactions are not free to insert, update or delete the data being read.

This level of isolation does not suffer from dirty, nonrepeatable or phantom reads.

This level of isolation suffers from the least amount of concurrency and is bad on performance.

Chaos

Allows dirty, lost updates, phantom and nonrepeatable reads but not concurrently with other transactions with a higher isolation level than themselves.

Snapshot

Every writable transaction causes a virtual state of the data before the transaction is made. No exclusive lock is made on the data though isolation is guaranteed by allowing other  transactions to perform read actions on the data at the state before the writable transaction started,

Checkpoints

Checkpoints provide a way of defining temporary save points; data can be rolled back to these intermediatory points.

Save does not persist the changes to the database and a Commit would still be required upon completion of all writable transactions.

using (var tran = cn.BeginTransaction (IsolationLevel.ReadCommitted)) {

cmdOne.Transaction = tran;
cmdTwo.Transaction = tran;

cmdOne.ExecuteNonQuery ();

tran.Save ("Charlie");

cmdTwo.ExecuteNonQuery ();

tran.Rollback ("Charlie");
tran.Commit ();
}

Not all database vendors provide checkpoints. SQL Server does. MySQL does though the MySql Connector ADO.NET data provider does not currently appear to support this feature.

Nested Transactions

If the database vendor / ADO.NET Data Provider does not support checkpoints, nested transactions can always be implemented to achieve the same result. Here a new transaction can be created within the scope of another transaction.

using (var tranOuter = cn.BeginTransaction ()) {

cmdOne.Transaction = tranOuter;
cmdOne.ExecuteNonQuery ();

using (var tranInner = cn.BeginTransaction ()) {
cmdTwo.Transaction = tranInner;

cmdTwo.ExecuteNonQuery ();
tranInner.Rollback ();
}

tranOuter.Commit ();
}

ADO.NET – Disconnected Layer

Intro

The disconnected layer allows caching of data offline. The data can then be queried and manipulated completely disjoint from the data source before being persisted at a later time.

The main class is the DataSet which represents a subset of the data from a data source. Each table is represented by a DataTable. The schema, relationships and constraints can all be modelled.

Source Code

All source code can be found on GitHub here.

This is part of my HowTo in .NET seriies. An overview can be seen here,

Getting Data

The DataAdapter class is used to cache a subset of a database offline into a series of DataTables.

If the executed query yields only one result set a DataTable can be filled instead of a DataSet:

ConfigurationManager.ConnectionStrings ["MyDatabase"];

var providerName = connectionDetails.ProviderName;
var connectionString = connectionDetails.ConnectionString;

var queryString = "SELECT A,B,C FROM MyTable";

var dbFactory = DbProviderFactories.GetFactory (providerName);

// Create and configure a connection
 var connection = dbFactory.CreateConnection ();
 connection.ConnectionString = connectionString;

// Create and configure a DbCommand
 var command = dbFactory.CreateCommand ();
 command.CommandText = queryString;
 command.Connection = connection;

// Create and configure a DbDataAdapter
 var adapter = dbFactory.CreateDataAdapter ();
 adapter.SelectCommand = command;

// Fill A DataTable
 var dataTable = new DataTable ();
 adapter.Fill (dataTable);

If the result set yields multiple tables a DataSet can be used to hold a series of DataTables; each populated with a yielded result set.

Each DataTable is named by default Table, Table1 etc however the TableMappings property can be used to rename the tables to something sensible:

 var queryString = "SELECT A,B,C FROM MyTable;
 SELECT A,B,C FROM YourTable";

var dbFactory = DbProviderFactories.GetFactory (providerName);

// Create and configure a connection
 var connection = dbFactory.CreateConnection ();
 connection.ConnectionString = connectionString;

// Create and configure a DbCommand
 var command = dbFactory.CreateCommand ();
 command.CommandText = queryString;
 command.Connection = connection;

// Create and configure a DbDataAdapter
 var adapter = dbFactory.CreateDataAdapter ();
 adapter.SelectCommand = command;

// Rename table mappings
 adapter.TableMappings.Add ("Table", "MyTable");
 adapter.TableMappings.Add ("Table1", "YourTable");

// Fill A DataTable
 var dataSet = new DataSet ();
 adapter.Fill (dataSet);

Calling Stored Procedures

Stored procedures, in built functions and user defined functions call all be called to fill a DataSet or DataTable object. The connected layer (previous chapter) details how to configure command objects with the required parameters. Below shows how to call a stored procedure which takes no parameters:

var connectionDetails =
ConfigurationManager.ConnectionStrings ["MyDatabase"];

var dbFactory = DbProviderFactories.GetFactory (connectionDetails.ProviderName);

// Create and configure a connection
 using (var connection = dbFactory.CreateConnection ()) {
 connection.ConnectionString = connectionDetails.ConnectionString;

// Create and configure a DbCommand
 var command = dbFactory.CreateCommand ();
 command.Connection = connection;
 command.CommandText = "spGetFoo";
 command.CommandType = CommandType.StoredProcedure;

// Create and configure a DbDataAdapter
 var adapter = dbFactory.CreateDataAdapter ();
 adapter.SelectCommand = command;

// Fill A DataTable
 var dataSet = new DataSet ();
 adapter.Fill (dataSet);

return dataSet;
 }

Accessing Data

Once a DataSet or DataTable has been filled it’s contents can be accessed.

DataSet

A DataSet is a class which holds a collection of DataTables which can be accessed by the Tables property:

var tables = dataSet.Tables;

The name of the data set can also be returned via the DataSetName property:

var name = dataSet.DataSetName;

The ExtendedProperties property exposes an instance PropertyCollection which can be used as a scratch collection to store arbitrary information about the DataSet or its contents as seen fit by the user:

PropertyCollection properties = dataSet.ExtendedProperties;

PropertyCollection inherits from HashSet:

properties.Add( "Connected", DateTime.Now);

foreach(DictionaryEntry de in properties)
{
var aKey = de.Key;
var aValue = de.Value;
}
</pre>
<h2>DataTable</h2>
<pre>
The DataTable represents a cache of a table or a subset of a table within a database.

A collection of DataTables can be accessed via the Tables property upon DataSet, it returns a TableCollection:


DataTableCollection tables = dataSet.Tables;

The TableCollection allows access to the contained tables via their table mapping name and it’s ordinal position:

DataTable dataTable = dataSet.Tables[0];
DataTable namedDataTable = dataSet.Tables["MyTable"];

The DataTable has properties to return its name, its owning DataSet and its Primary Key:

var tableName = dataTable.TableName;
var parentDataSet = dataTable.DataSet;
var primaryKey = dataTable.PrimaryKey;

DataRow

Each DataTable stores a collection of DataRows which represent the contained database records. The DataRowCollection can be accessed via the Rows property upon the DataTable.

DataRowCollection rows = dataTable.Rows;

var count = dataTable.Rows.Count;

A DataRow can be access via its ordinal position within the collection:

DataRow row = dataTable.Rows[0];

DataColumn

A DataColumn represents a column within a DataTable. It can be access via its ordinal position or by its name.

A DataColumnCollection holds a collection of DataColumns and can be accessed via the Columns property of the DataTable.

Alternatively a DataColumn can be accessed via the index method [] from the DataRowCollection.

var count = dataTable.Columns.Count;

DataColumnCollection columns = dataTable.Columns;
DataColumn columnOne = dataTable.Columns[0];
DataColumn columnTwo = dataTable.Columns["Field"];

int idValue = (int)dataTable.Rows [0] [0];
string columnFour = (string)dataTable.Rows [0] ["StringField"];

If a DataColumn is requested by a name which does not exist an ArgumentException is thrown.

If a DataColumn is requested by its ordinal position which does not exist a IndexOutOfRangeException is thrown.

Meta information about a DataColumn can found by a series of properties upon the DataColumn class. Here was access its name, ordinal position and containing DataTable.

var columnName = column.ColumnName;
var ParentDataTable = column.Table;
var position = column.Ordinal;

The columns value and whether it can take the value of null:

var defaultValue = column.DefaultValue;
var isNullable = column.AllowDBNull;

Type

The DataType property of a DataColumn returns a value from the System.Data.Type enum which represents the data type of the column which maps directly to .NET types:

* Boolean
* Byte
* Char
* DateTime
* Decimal
* Double
* Guid
* Int16
* Int32
* Int64
* SByte
* Single
* String
* TimeSpan
* UInt16
* UInt32
* UInt64

DataTableReader

Though it is possible to access the data as defined by each contained entity of the DataSet above, the DataTableReader class provides a read-only forward iterative access to data within a DataTable similar to a DataReader:

using (DataTableReader reader = dataTable.CreateDataReader ()){
while (reader.Read ()) {
// Access via the fields name or ordinal position
string aValue = (string)reader [0];
int bValue = (int)reader ["FieldOne"];

// Access should be preferred through strongly typed values
var isNull = reader.IsDBNull (1);
var aBool = reader.GetBoolean (1);
var aDateTime = reader.GetDateTime (1);
var aDecimal = reader.GetDecimal (1);
var aDouble = reader.GetDouble (1);
var aChar = reader.GetChar (1);

// Access the .NET field type or the data providers type
Type aType = reader.GetFieldType (1);
Type bType = reader.GetProviderSpecificFieldType (1);

// Access field name and ordinal position
var fieldName = reader.GetName (1);
int position = reader.GetOrdinal ("FieldOne");
}
}

Querying Data

A DataSet can be queried to return an Array of DataRows which have passed a predicate which is supplied in a syntax similar to SQL upon a DataTable.

Return all rows:

DataRow[] rows = dataTable.Select();

Return all rows passing a predicate:

DataRow[] rows = dataTable.Select("FieldA='AValue'");

The syntax for querying a DataTable and also a DataView (see below) is relatively simple though relatively extensive.

The following section defines the syntax with an example usage:

Literals

Rules for defining literals

Type Description Example
Number Numbers are simply referenced inside the query string without quotes. “FieldA = 10”
Sting Strings are contained within the query string inside single quotes. “FieldA = ‘FooFooey'”
Dates Dates are contained between prefixing and suffixing # symbols.

They can be segmented with – and / and can also contain times”Date = #12/31/2008#”

“Date = #2008-12-31#”

“Date = #12/31/2008 16:44:58#”Escape CharsThe following characters must be escaped: ~ ( ) # \ / = > < + – * % & | ^ ‘ ” [ ].

They can be escaped with [] or \“FieldA = ‘[Name*Surname]’”

Comparison Operators

Comparison operators allow comparing two operands.

Type Description Example
= Equals “AField = 1”
<> Not equals “AField <> 1”
< Less than “AField < 1”
<= Less than or equal to “AField <= 1”
> Greater than “AField > 1”
>= Greater than or equal to “AField >= 1”
IN The value is contained within a set of values “AField IN (‘1, 2, 3)”
LIKE Wildcard comparison.

Wildcards % and * can be used for one or one or more characters respectively.

They can only be placed at the start or the end of a value but not in the middle.”AField LIKE ‘Baz*'”

 

Logical Operators

Logical operators work between other operators

Type Description Example
NOT Negates a LIKE or IN expression “AField NOT IN (1, 2, 3)”
AND Two expressions must be evaluated to true “Afield = ‘Avalue’ AND BField >= 50”
OR Either of two expressions must evaluate to true. “Afield = ‘Avalue’ OR BField >= 50”

The precedence of the operators is the comparison operators is NOT, AND and then OR.

Arithmetic Operators

Operators for working on numeric fields.

Type Description Example
+ Addition “AField + 10 = 12”
Subtraction “AField – 10 = 12”
* Multiplication “(AField * 0.2) > 10”
/ Division “(Afield / 0.8) > 10”
% Modulus “(Afield % 0.8) > 10”

String Operators

Operator(s) for working on string fields.

Type Description Example
+ Concatenation “(AField + BField) = ‘James Pond'”

Parent-Child Relation Referencing

Relationships between tables can be traversed with the notation ..

Parent to children traversals must be used within an aggregate function due to the 1:n nature (they can potentially return a set of records).

Where multiple relationships exist on the same parent column; the relationship name must be used. (RelationsipName).ChidlField.

We look at adding relationships into the schema in the following sections.

Aggregate Functions

Aggregate functions work on a single column of a set of data and return a value.

Type Description Example
SUM Sum “SUM(ParentFK.ChildField) > 10″
COUNT Count “COUNT(ParentFK.ChildField) > 10”
MIN Minimum “MIN(ParentFK.ChildField) > 10”
MAX Maximum “MAX(ParentFK.ChildField) > 10”
AVG Average (Mean) “AVG(ParentFK.ChildField) > 10”

Functions

Various other functions:

Type Description Example
CONVERT Convert a value to a .NET type “Convert(FieldAString, ‘System.Int32’) > 10”
LEN Returns the length of a string “LEN(FieldA) > 10”
ISNULL Replaces NULL with a defined value “IsNull(FieldA, -1)”
IIF Evaluates a condition; returns the second operand if true or the third if false. “IIF(FieldA = 1, ‘One’, ‘Not One’)”
TRIM Removes white space from the start and end of a string “TRIM(FieldA) = ‘Hello'”
SUBSTRING Returns a subset of a string between the start and end character positions “SUBSTRING(FieldA, 1, 2)=’UB’”

Sorting Data

The last parameter of the Select method allows a sort criteria:

Sort by a field ascending:

var rows = dataTable.Select(string.Empty, "FieldA");

Sort by multiple fields:

var rows = dataTable.Select(string.Empty, "FieldA, FieldB");

Sort descending:

var rows = dataTable.Select(string.Empty, "FIELDA DESC");

DataView

A DataView is a customized view of a DataTable. It allows sorting, filtering, searching, and navigating data. The data is connected to the DataTable it was created against; changes to the data affects the data in the parent DataTable.

A Data View is created from a DataTable:

var dv = new DataView(dataTable);
[/sourcecode ]

RowFilter allows providing predicates to a DataView. It takes the same syntax string as DataTable.Select() however instead of returning an array of DataRows which pass the condition it changes the view of the data automatically:


dv.RowFilter = "Field='Value";

All the syntax of querying a DataTable is applicable for the RowFilter property of the DataView mentioned above.

Schema

The following is a quick run through of how to manually create the schema of a DataSet and its contained entities:

DataSet

Instantiate a DataSet:

var dataSet = new DataSet ("MyDataSet");
dataSet.ExtendedProperties ["Create"] = DateTime.Now;

DataTable

Instantiate a DataTable and add it into a DataSet:

DataTable dataTable = new DataTable ("MyTable");
dataSet.Tables.Add (dataTable);

DataColumns

Add columns into a DataTable:

var column = new DataColumn ("ID", typeof(int)) {
Caption = "My Table ID",
ReadOnly = true,
AllowDBNull = false,
Unique = true
};

dataTable.Columns.Add(column);

PrimaryKey

The PrimaryKey property is an Array of DataColumns which represent the primary key:

dataTable.PrimaryKey = new DataColumn[]
{ dataTable.Columns[0], dataTable.Columns[1] };

AutoIncrement Column

AutoIncrement columns provides an auto generating integer similar to an identity index, though uniqueness is only guaranteed locally.

The seed is the starting index value and the increment is the amount to increment each time:

var seedColumn = new DataColumn ("ID", typeof(int)) {
Caption = "My Table ID",
ReadOnly = true,
AllowDBNull = false,
Unique = true,
AutoIncrement = true,
AutoIncrementSeed = 0,
AutoIncrementStep = 1,
};

Relationships

Creation

Relationships are created by mapping the foreign keys between DataTables contained within a DataSet:

A DataRelation is created by defining the relationship name, the parent key column and then the child key column. The DataRelation should then be added into the Relations property of a DataSet which returns a DataRelationCollection:

var dr = new DataRelation ("ParentChild",
dataSet.Tables ["Parent"].Columns ["ID"],
dataSet.Tables ["Child"].Columns ["ParentID"]);

dataSet.Relations.Add (dr);

Navigation

Navigation between child and parent rows or viceversa can be performed once a relationship has been set up. In both cases the relationship should be used against either the child or parent rows:

Children Rows:

dataSet.Tables ["Parent"].Rows[0].
GetChildRows(dataSet.Relations["ParentChild"]);

Parent Row:

dataSet.Tables ["Child"].Rows [0].
GetParentRow (dataSet.Relations ["ParentChild"]);

Constraints

ADO.NET provides two forms of constraints, foreign keys and unique. The enforcement can be disabled or enabled with the EnforceConstraints property. By default constraints are enforced:

dataSet.EnforceConstraints = false;

Foreign Key Constraints

ADO.NET provides a Delete and Update constraints for maintaining data integrity between DataRows connected via relationships. They both default to None:

Rule Description
Cascade During update, related rows have their affected columns updated to that of the related row.

During delete, related rows are themselves deleted.SetNullRelated rows have affected columns set to DBNullSetDefaultRelated rows have affected columns set to their default valueNoneNo action is performed; this is the default action

The AcceptRejectRule defines what happens to child rows when AcceptChanges or Rejected changes is called upon the parent row or the containing DataSet; we talk about these methods more during the section on manipulating and persisting changes.

AcceptRejectRule Description
Cascade Changes are accepted or rejected upon child rows
None No action is performed upon child rows; this is the default action.
var parent = dataSet.Tables[0].Columns[0];
var child= dataSet.Tables[0].Columns[1];

var fk = new ForeignKeyConstraint( "FK", parent, child );

fk.DeleteRule = Rule.SetNull;
fk.UpdateRule = Rule.Cascade;
fk.AcceptRejectRule = AcceptRejectRule.None;

dataSet.Tables [0].Constraints.Add (fk);

Relationships ( see above ) automatically add in a constraint.

Unique Constraints

Unique constraints ensure that no column, or combination of columns contain duplicates.

A unique constraint for all columns contained within the primary key is automatically created.

For a single column requiring a unique constraint the Unique property on DataColumn can be set

var uniqueColumn = new DataColumn ();
uniqueColumn.Unique = true;

For multiple columns a UniqueConstraint class containing an array of columns should be created and added to the DataTable’s Constraints property which returns a ConstraintCollection:

dataTable = dataSet.Tables["MyTable"];
Var uniqueConstraint = new UniqueConstraint(new DataColumn[] {
dataTable.Columns["FieldA"],
dataTable.Columns["FieldB"]
});

dataSet.Tables["MyTable"].Constraints.Add(uniqueConstraint );

Reading Schema

Calling DbDataAdapter.Fill(aDataSet) is optimized for read-only scenarios. Only the schema required to access the data itself is retrieved; the data, the type and the name of the column.

The schema can be loaded by a call to DataAdapter.FillSchema or with DataAdapter.Fill with explicit reference to load the missing schema.

The bounds of the schema returned is defined by the SelectCommand and the returned columns defined upon the DataAdapter. If multiple result sets are found, only the first defined within the SelectCommand will have the schema returned.

The following is a synopsis of what schema is actually loaded:

A DataTable is added into the DataSet with all columns returned by the select statement. Each column will have the following schema set:

MissingSchemaAction Description
AllowDBNull Whether the column allows null data.
AutoIncrement Whether the column should be incremented. AutoIncrementStep and AutoIncrementSeed must be set manually.
MaxLength The maximum size of data the column can take
Readonly If the column is readonly
Unique If the column has a unique constraint placed upon it

The primary key information is set if any of the primary key columns are returned from the data source. If the primary keys are not returned from the data source any unique columns returned are used.

Foreign key constraints are not returned and must be loaded manually.

MissingSchemaAction

MissingSchemaAction property on DbDataAdapter is used to set the amount of schema to loaded during calls to DataAdapter.Fill() and more importantly what to do if the schema already contained within a DataSet does not match the incoming schema.

adapter.MissingSchemaAction = MissingSchemaAction.AddWithKey;
adapter.Fill(aDataSet);
MissingSchemaAction Description
Add Adds the missing tables and columns as required.
AddWithKey Adds the missing tables and columns as required. Also ensures that the primary key and it’s constrains match the incoming schema. Foreign key constraints are not included and must be created explicitly.
Error An InvalidOperationException is thrown if any columns exist in the DataTable which don’t exist in the data source.
Ignore Any columns in the DataTable which are not in the data source are simply ignored.

FillSchema

The DataAdapter.FillSchema() is used load the schema within a data source disjointly from a call to DataAdapter.Fill().

SchemaType Description
Mapped Any existing table and column mappings within the DataSet are ignored in favor of those retrieved from the data source.
Source Any existing table and column mappings within the DataSet are used in favor of those retrieved from the data source.

It must be noted that FillSchema being called before a Fill method leaves the connection open. If a FillSchema is called after a Fill method the connection is closed afterwards.

Fill vs. FillSchema

It must be pointed out that Fill will load schema every call if the schema has been asked for. It is advised to use FillSchema when multiple calls to Fill is made which will result in the same schema being read.

It is advised to use neither if only a read only data is required as the schema is not required.

It is advised to use neither if XML data is being used and the defined XSD schema is to be used instead of the auto generated schema when reading the XML.

It is advised to use neither when using Visual Studio design tools as the schema for updating the data source is auto generated.

Manipulating Data

Data is manipulated within a DataSet by either deleting, editing or inserting DataRows.

Deleting

DataRows can be deleted via the Delete method on an instance of the DataRow class:

aDataTable.Rows[0].Delete();
aDataRow.Delete();

Updating

DataRows can be modified via the index methods upon an instance of the DataRow class:

Updating Rows via the DataRowCollection:

aDataTable.Rows[0][0] = "NewValue";
aDataTable.Rows[0]["FieldA"] = "NewValue";

Updating via a DataRow:

aDataRow[0] = "NewValue";
aDataRow["Field"] = "NewValue";

Inserting

Inserting a new DataRow into a DataRowCollection entails creating a new DataRow, setting the data and then adding the DataRow into the DataRowCollection:

Create a template row:

DataRow dataRow = aDataTable.NewRow();

Setting the data:

dataRow [0] = 1;
dataRow ["Field"] = "Value";

Adding the row into the table:

aDataTable.Rows.Add (dataRow);

Accepting and Rejecting Changes

A transactional state of data is maintained throughout editing. The changes can be rejected and committed through a series of methods on the DataRow and DataTable classes.

Method Description
DataRow.BeginEdit() Notifies the DataRow as being edited. Disables change events on the DataRow
DataRow.CancelEdit() Notifies the DataRow as no longer being edited. Any changes are discarded.
DataRow.EndEdit() Notifies the DataRow as no longer being edited. The changes are signalled to be kept but no changes are committed locally into the DataSet until AcceptChanges has been called.
DataRow.AcceptChanges() Notifies the DataRow as no longer being edited. Pending changes are all committed locally into the DataSet.
DataTable.AcceptChanges() All DataRows which have had uncommited changes have their changes committed locally into the DataSet.
DataRow.RejectChanges() All DataRows which have had uncommited changes have their changes undone.

Validating Data

Validation errors can be placed upon fields within a DataRow with the SetColumnError method:

dataRow.SetColumnError(0, "Invalid Data");
dataRow.SetColumnError("FieldName", "Invalid Data");

Validation errors can then be retrieved with the following methods:

var dsHasErrors = aDataSet.HasErrors;
var dtHasErrors = aDataSet.Tables [0].HasErrors;
var errorRows = aDataSet.Tables [0].GetErrors ();
var errorCols = dataRow.GetColumnsInError ();
var errorMsg = dataRow.GetColumnError (0);

Validation messages should be cleared once an action has been made upon them. This can be at row level via the ClearErrors method or at the DataSet level with RejectChanges:

errorRows[0].ClearErrors();

The error messages can be iterated over and the changes accepted or rejected::

var errorResult = new StringBuilder ();

var dataSetChanged = DS.GetChanges (DataRowState.Modified);

if (dataSetChanged != null && dataSetChanged.HasErrors) {
foreach (DataTable dataTable in dataSetChanged.Tables) {
if (!dataTable.HasErrors) {
continue;
}

foreach (var dataRow in dataTable.GetErrors()) {
foreach (var dataColumn in dataRow.GetColumnsInError()) {
var errorDesc = dataRow.GetColumnError (dataColumn);
// Implicitly called with aDataSet.RejectChanges:
//dataRow.ClearErrors();
errorResult.AppendLine (errorDesc);
}
}
}

DS.RejectChanges ();
} else if (dataSetChanged != null) {
Adapter.Update(dataSetChanged);
// implicitly called with adapter
//Update:_dataSet.AcceptChanges();
}

return errorResult.ToString ();

Persisting Data

Data persistence has two levels; locally into the disconnected DataSet and then at a later time back into the data source.

Row State

ADO.NET maintains a status against each row to track local changes which are then used to persist the data back to the database in a single hit.

The RowState allows analysis of the state of the row in relation to its state when it was loaded into the DataSet and against any changes which are occurring or have occurred.

The state of a row can be returned by the RowState property:

DataRowState state = aDataRow.RowState;

The DataRowState Enum values:

DataRowState Description
Added The row has been added into the DataRowCollection but the changes have not been accepted via the AcceptChanges methods.
Deleted The row has been deleted.
Detached The row has been created but has not been added into a DataTable.
Modified The row has been modified but the changes have not been accepted via the AcceptChanges methods.
Unchanged The row has not changed since it was loaded into the DataTable or since a call to AcceptChanges or RejectChanges has been made.

Consider the following process flow in the life of a DataRow: creation, add, modify, and deletion. The state of the row will change as follows:

var row = aDataTable.NewRow (); // RowState is Detached

aDataTable.Rows.Add (row); // RowState is Added

row = aDataTable.Rows [0];
row ["Field"] = 10; // RowState is Modified

row.AcceptChanges (); // RowState is Unchanged

row ["Field"] = 11; // RowState is Modified

row = aDataTable.Rows [1];
aDataTable.Rows [0].Delete (); // RowState is Deleted

Row Versions

The motion of state is actually more complex; up to three versions of the DataRow could actually exist at any time.

Any three of the following row versions can exist:

DataRowVersion Description
Current A row containing the current data.
Default The default version of DataRowState.

For a DataRowState value of Added, Modified or Deleted, the default version is Current.

For a DataRowState value of Detached, the version is Proposed.OriginalA row containing the original dataProposedA row containing proposed data

The row versions are set and maintained by the following methods:

Method Description
DataRow.BeginEdit() Change the row’s value, the Current and Proposed values become available
DataRow.CancelEdit() The Proposed value is deleted.
DataRow.EndEdit() The Proposed value becomes the Current value
DataRow.AcceptChanges() The Original value becomes identical to the Current value

If this is called before DataAdapter.Update no changes to the row within the

data source will be persisted. (It is called implicitly by DataAdapter.Update)DataTable.AcceptChanges()The Original value becomes identical to the Current value

If this is called before DataAdapter.Update no changes to the data within the DataSet will be persisted back into the data source. (It is called implicitly by DataAdapter.Update)DataRow.RejectChanges()The Proposed value is discarded, and the original version becomes Current

The index method upon DataRow has been overloaded to take an instance of the DataRowVersion enum. The value of the field upon the specified version is returned.

var aValue = aDataTable.Rows [0] ["AField",
DataRowVersion.Original].ToString ();
[/sourcecode language='csharp']</pre>
<h2>DataAdapters</h2>
<pre>
We have already seen the DataAdapter previously for filling a DataSet via the SelectCommand. The DataProvider also provides an InsertCommand, UpdateCommand and DeleteCommand. These are SQL template commands used to automatically generate SQL which is then persisted back to the database.

The SelectCommand, InsertCommand, UpdateCommand and DeleteCommand can all be configured manually or via Visual Studio designer.

Only the SelectCommand is required if only the DataSet.Fill() is required with no data persistence.

The InsertCommand, UpdateCommand and DeleteCommand are all required if persistence is required.

Persistence is made by calling Update on the DataAdapter. It automatically generates the required SQL with insert, update and delete statements to create, edit or delete the database data to the state contained within the DataSet. It uses the RowState of each row to determine what and if any action is required.

One advantage the DataAdapter has is that the connection is automatically managed. The only requirement is the creation of the connection, connection string and the assignment to the DataAdapter. The connection and any required transactions is automatically handled inside the DataAdapter.


var connectionDetails =
ConfigurationManager.ConnectionStrings ["MyDatabase"];

var dbFactory = DbProviderFactories.GetFactory (connectionDetails.ProviderName);

Adapter = dbFactory.CreateDataAdapter ();

// Create and configure a connection
using (var connection = dbFactory.CreateConnection ()) {
connection.ConnectionString = connectionDetails.ConnectionString;

// Create Select Command
DbCommand selCmd = dbFactory.CreateCommand ();
selCmd.CommandText = "SELECT idMyTable,FieldA FROM MyTable";
selCmd.Connection = connection;

Adapter.SelectCommand = selCmd;

// Parameters
var fieldAParam = dbFactory.CreateParameter ();
fieldAParam.DbType = DbType.String;
fieldAParam.ParameterName = "@FieldA";
fieldAParam.SourceColumn = "FieldA";
fieldAParam.DbType = DbType.String;

var idParam = dbFactory.CreateParameter ();
idParam.DbType = DbType.Int32;
idParam.ParameterName = "@idMyTable";
idParam.SourceColumn = "idMyTable";
idParam.DbType = DbType.Int32;

// Configure Insert Command
var insCmd = dbFactory.CreateCommand ();

insCmd.CommandText = "INSERT INTO MyTable (FieldA) VALUES (@FieldA)";

insCmd.Parameters.Add (fieldAParam);
insCmd.Connection = connection;
insCmd.CommandType = CommandType.Text;
Adapter.InsertCommand = insCmd;

// Delete Command
var delCmd = dbFactory.CreateCommand ();

delCmd.CommandText = "DELETE FROM MyTable WHERE idMyTable = @idMyTable";
delCmd.Parameters.Add (idParam);
delCmd.Connection = connection;
delCmd.CommandType = CommandType.Text;
Adapter.DeleteCommand = delCmd;

// Configure Update Command.
var upCmd = dbFactory.CreateCommand ();

upCmd.CommandText = "UPDATE MyTable SET FieldA = @FieldA WHERE idMyTable = @idMyTable";
upCmd.Parameters.Add (idParam);
upCmd.Parameters.Add (fieldAParam);
upCmd.Connection = connection;
upCmd.CommandType = CommandType.Text;
Adapter.UpdateCommand = upCmd;
}
Adapter.Fill (DS);

SqlCommandBuilder

The SQLCommandBuilder automatically configures the insert, updated and deleted commands based upon the SelectCommand.

DataAdapter.Update() reads the databases schema to generate the sql, however this requires additional database reads and as such degrades performance.

SQLCommandBuilder can only be used if all the following conditions hold true:

* The select command interacts with a single table
* The table has a defined primary key
* All columns making the primary key must be returned with the select command

SQLCommandBuilder usage:

var connectionDetails =
ConfigurationManager.ConnectionStrings ["MyDatabase"];

var providerName = connectionDetails.ProviderName;
var connectionString = connectionDetails.ConnectionString;
var dbFactory = DbProviderFactories.GetFactory (providerName);

var adapter = dbFactory.CreateDataAdapter ();

// Create and configure a connection
var connection = dbFactory.CreateConnection();
connection.ConnectionString = connectionString;

// Create Select Command
var selCmd= dbFactory.CreateCommand();
selCmd.CommandText = "SELECT A,B,C FROM MyTable";
selCmd.Connection = connection;
adapter.SelectCommand = selCmd;

// Create and configure DbCommandBuilder
var builder = dbFactory.CreateCommandBuilder();
builder.DataAdapter = adapter;

// Create and fill a DataTable
var dt = new DataTable("MyTable");
adapter.Fill (dt);
// Changes would occur here….

// Persist changes
adapter.Update(dt);

Strongly Typed DataSets

The XSD tool can be used to create a strongly typed DataSet. The DataSet can be filled, manipulated, queried and persisted back to the DataSet. The benefit is that it exposes named properties which are generated from the provided schema.

Visual Studio Server Explorer and SQL Server Management Studio can be used to generate an XSD for a subset of a database’s schema. Similarly MySQL Workbench offers similar functionality for MySQL.

The XML Schema Definition Tool (xsd.exe) can be used to create XSD from C# classes and XML and viceversa. The tool can be accessed via the Visual Studio command prompt.

Mono offers a number of tools including DTD2XSD and Mono XSD.

Creating

A strongly typed DataSet can be created via the xsd.exe tool and an xsd file representing the required schema of the DataSet to be generated:

xsd.exe /d /l:CS MyDataSet.xsd /eld /n:MyDataSet

Switch Description
/d A DataSet should be generated
l:CS The generated code should be in the C# language
/eld The generated DataSet should allow Linq queries to be executed against it.
/n The namespace the generated code should be placed into

The generated code is a series of classes inheriting from the inbuilt DataSet, DataTable, DataRow etc and all named after their XSD schema definitions.

Filling

The strongly typed dataset generated can be filled:

var myDataSet = new MyDataSet ();
var myTable = new MyDataSet.MyTableDataTable ();

adapter.Fill (people);</pre>
<h2>Accessing Data</h2>
<pre>
foreach (MyDataSet.MyTableRow myTableRow in myTable) {
var id = myTableRow.Id;
var name = myTableRow.Name;
var isMale = myTableRow.IsMale;
}</pre>
<h2>Inserting Rows</h2>
<pre>
MyDataSet.MyTableRow newRow = myTable.NewMyTableRow();

newRow.Name = "Bobby McFroggy";
newRow.Id = 1;
newRow.IsMale = true;

myTable.AddMyTableRow(newRow);

adapter.Update(myTable);

DataTable also exposes a method to create and add a DataRow from the required field values:

myTable.AddMyTableRow (2, "James Pond", true);

More

Strongly typed methods for finding, deleting and also executing stored procedures will be generated.

LINQ to DataSet

DataSet can be queried and manipulated with Linq due to a series of extension methods within System.Data.DataSetExtensions.dll. The methods extends elements within the System.Data namespace namespace:

Extension Name

Description

DataTableExtensions

Allows querying DataTables with Linq

DataRowExtensions

Allows querying DataRows with Linq

TypedTableBaseExtensions

Allows querying strongly typed datasets with Linq

The functionality is identical to Linq though here are some pointers for working with DataSets and Linq.

To use Linq on a DataTable, AsEnumerable() must be called.

Exposing Linq:

IEnumerable queryOne =
from row in aDataTable.AsEnumerable ()
select row;

Predicates:

IEnumerable queryTwo =
aDataTable.AsEnumerable ().
Where (p => p.Field ("Name") == "Bob");

Ordering:

IEnumerable queryThree =
from p in aDataTable.AsEnumerable ()
orderby p.Field("Age") descending
select p;

With anonymous types:

var query = aDataTable.AsEnumerable ().Select (p => new
{
Name = p.Field ("Name"),
Age = p.Field ("Age")
});

ADO.NET & XML

ADO.NET can be used with XML as a data source through exposed methods on the DataSet class.

Read XML from a file:

aDataSet.ReadXml("data.xml");

Write a DataSet to an XML file:

aDataSet.WriteXml("data.xml");

Populate a DataSets schema from a an xsd file:

aDataSet.ReadXmlSchema("data.xsd");

Write a DataSets schema into a xsd file:

aDataSet.WriteXmlSchema("data.xsd");

Binary Serialization

A DataSet can be serialized to XML, Soap or binary. Here we show binary serialization:

Serialize A DataSet

aDataSet.RemotingFormat = SerializationFormat.Binary;

using (var fs = new FileStream ("data.dat", FileMode.Create)) {
var bf = new BinaryFormatter ();
bf.Serialize (fs, aDataSet);
}

DeSerialize A DataSet

aDataSet.RemotingFormat = SerializationFormat.Binary;

using (Stream fs = File.OpenRead ("data.dat")) {
var bf = new BinaryFormatter ();
aDataSet = (DataSet)bf.Deserialize (fs);
}

ADO.NET – Connections And Data Providers

Intro

ADO.NET is a framework for accessing, interrogating, manipulating and persisting data in relational databases.

Source Code

All source code can be found on GitHub here.

This is part of my HowTo in .NET seriies. An overview can be seen here,

Data Providers

The framework abstracts consuming code away from the specifics of each vendor’s database implementation allowing code to be written which is virtually agnostic to the data source.

Each ADO.NET capable database provides an ADO.NET data provider which handles the databases specific implementation needs.

Through a series of classes, abstract classes and interfaces, ADO.NET provides functionality which is consistant regardless of the database vendor.

ADO.NET data providers exist for most database vendors. A list can be found here: http://msdn.microsoft.com/en-gb/data/dd363565.aspx

Configuring A Database Provider

Depending upon your choice of database and data provider, you might need to configure the provider to be used with .NET.

SQL Server is configured automatically on Windows/.NET. Under Linux/Mono SQLIte is configured automatically yet MySQL is not.

If you are using MySQL I would recommend the Connector/Net ADO.NET data provider. You can download the driver here: http://dev.mysql.com/downloads/connector/net/

The data provider is written in .NET and is provided as an assembly which needs to be installed into the GAC.

sudo gacutil -i MySql.Data.dll

To check the installation went ok. You can list the contents of the gac with the gacutil command. The command below uses the POSIX command grep to filter the results:

gacutil -l | grep My

Data providers are registered within the machine.config of the .NET version you are running.

For windows the machine.config location will look something like:

C:\Windows\Microsoft.NET\Framework\v2.0.50727\CONFIG\machine.config

For Mono and Debian based Linux the machine.config location will look something like:

/usr/local/etc/mono/2.0/machine.config

The configuration will be specific to the version of your data provider. You should check the installation page of you vender to determine this. The following is for MySQL Connector provider version 6.3.5.0:

<add name=”MySQL Data Provider”
invariant=”MySql.Data.MySqlClient”
description=”.Net Framework Data Provider for MySQL”
type=”MySql.Data.MySqlClient.MySqlClientFactory, MySql.Data,
Version=6.3.5.0, Culture=neutral,
PublicKeyToken=c5687fc88969c44d” />

This should be copied into the Machine.Config within the <DbProviderFactories> node which sits within the <System.Data> node. You should find other providers already configured.

The data provider can then be referenced as any other assembly within the GAC.

Bases Classes

ADO.NET provides a series of core and abstract classes; they provide common functionlaity that is not specific to any data priovider.

ADO.NET also provides a series of interfaces. ADO.NET providers implement these interfaces.

The data providers are welcome to inherit from the abstract classes, and most do, to reuse the common functionality associated with them.

ADO.NET API consumes the interfaces rather than the abstract classes to allow a data provider with greater control of their required funcitonality.

Below we describe the core abstract classes and their interfaces.

Class / Name Space

Description

System.Data

Provides a common namespace for most of the abstract classes and interfaces which each data provider inherit from or implement.

It also contains common entities which are not specific to a data provider; datasets, tables, rows, columns, relational constraints etc.

DbConnection

IDbConnection

Supports for configuring connections

DbTransaction

IDbTransaction

Support for database transactions

DbCommand

IDbCommand

Support for calling SQL statements, stored procedures and parameterized queries.

Provides access to the an instance of the  DataReader class through the ExecuteReader() method

DbParameterCollection

IDataParameterCollection

Provides a collection of DbParameters to IDBCommand.

DbParameter

IDbDataParameter

IDataParameter

Provides a parameter to sql statements and stored procedures. It is used by IDBCommand.

DbDataReader

IDataReader

Provides a read only iterative view of the data returned from a SQL statement.

Provides access to strongly typed data via field names or their ordinal positions.

DbDataAdapter

IDbDataAdapter

IDataAdapter

Provides access to a cached subset of data, monitoring of any changes which can then be persisted back into the database at a later time.

Below we map the abstract classes and interfaces to the SQL Server and MySQL data provider classes. As you can see the naming convention is consistent:

Class / Name Space

SQL Server

MySQL

System.Data

System.Data.SqlClient

MySql.Data.MySqlClient

DbConnection

SqlConnection

MySqlConnection

DbTransaction

SqlTransaction

MySqlTransaction

DbCommand

SqlCommand

MySqlCommand

DbParameterCollection

SqlParameterCollection

MySqlParameterCollection

DbParameter

IDbDataParameter

IDataParameter

SqlParameter

MySqlParameter

DbDataReader

IDataReader

SqlDataReader

MySqlDataReader

DbDataAdapter

IDbDataAdapter

IDataAdapter

SqlDataAdapter

MySqlDataAdapter

Though it is possible to code directly to the MySQL or SQL Server data provider classes, it is not recommended and considered bad practice.

All the examples will be make use of the DbProviderFactories class and the associated DbProviderFactory class to create instance of the required data provider specific classes. The DbProviderFactories uses late binding to to determine which class the data provider has provided as part of their implementation.

Database Connections

Connection Strings

ADO.NET allows a virtually database agnostic approach to database access within .NET.

The connection strings are vendor specific and can turn off and on a number of features each of which might be specific to each vendor.

In short the minimum requirements is the server name, the database name and some form of login criteria. As each vendors connection string is specific to the vendor and data provider. Here are some examples:

Database
Connection String
Additional
SQL Server
Data Source=localhost;Integrated Security=SSPI;Initial Catalog=MyDB

SSPI means the logged in user will be used to connect with windows integrated security.

Alternative you could provide the username and password
MySQL
Server=localhost;Database=MyDB;Uid=myUsername;Pwd=myPassword;
MySQL data adapter does not appear to support SSI

It is advised to use connection pooling when concurrent database access is required. You should check each vendor for how to set this up.

http://www.connectionstrings.com provides an excellent knowledge base for connecting to most databases in most technologies including ADO.NET.

Machine.Config

Though connection strings can be defined anywhere, it is good practice to place them within the machine.config. This can allow changing connection criteria including the database server without compiling code; great for having separate databases for development, testing and production.

Connection Strings should be placed between the appSettings and the connectionStrings node.

The connection string can be named with the name field for reference in code later.

The providers assembly is referenced with the providerName field.

SQL Server:

<connectionStrings>
<add name =”MyDatabase”
providerName=”System.Data.SqlClient”
connectionString = “Data Source=localhost; Integrated
Security=SSPI;Initial Catalog=MyDB”/>
</connectionStrings>

MySQL:

<connectionStrings>
<add name=”MyDatabase”
providerName=”MySql.Data.MySQLClient”
connectionString=”Server=localhost;Database=MyDB;User
ID=Me;Password=MyPassword” />
</connectionStrings>

ConfigurationManager

The ConfigurationManager provides access to the defined connections within the Machine.Config (and App.Config) files via the name provided. Above we defined a connection called MyDatabase:

var connectionDetails =
ConfigurationManager.ConnectionStrings["MyDatabase"];

var providerName = connectionDetails.ProviderName;
var connectionString = connectionDetails.ConnectionString;

Connecting To a Database

To connect to a database an instance of DbConnection should be initialised against the database vendor specific connection string.

The The DbProviderFactories can be used to create the database connection object. It takes the database provider name which we configured as part of the connection information.

DbProviderFactory dbFactory =
DbProviderFactories.GetFactory(providerName);

using (DbConnection connection = dbFactory.CreateConnection())
{
connection.ConnectionString = connectionString;
connection.Open();
}

DbConnection implements IDisposable; it should be contained within a using scope.

ConnectionStringBuilder

The ConnectionStringBuilder class can be used in conjunction with a connection string to help customise a connection string in code. The index method [] can be used to set key value pairs representing connection properties.

It can be initialised with a connection string:

var connectionDetails =
ConfigurationManager.ConnectionStrings ["MyDatabase"];

var providerName = connectionDetails.ProviderName;
var connectionString = connectionDetails.ConnectionString;

var dbFactory = DbProviderFactories.GetFactory (providerName);

var builder = dbFactory.CreateConnectionStringBuilder ();
builder.ConnectionString = connectionString;
builder ["ConnectionTimeout"] = 60;

using (var connection = dbFactory.CreateConnection ()) {
connection.ConnectionString = builder.ConnectionString;
connection.Open ();
}

Connected vs Disconnected Layers

ADO.NET provides two conceptual layers; connected and disconnected.

The connected layer requires a live connection to the database to remain open during interaction. It is used for short transactions to the database such as calling stored procedures, running schema update or writeable transactions. The layer can be used for reading data though as the connection remains open; it is advised to limit the usage for short use.

The disconnected layer allows populating a DataSet object which allows an offline representation of the data. The connection remains open only for the amount of time it takes to populate the required data into the DataSet. The dataset can then be manipulated independently without the connection being open. The data can even be persisted back to the database at a later date.

Multithreading

Source Code

All source code can be found on GitHub here.

This is part of my HowTo in .NET seriies. An overview can be seen here,


Intro

Multithreading is the concept of creating and utilising multiple threads. There are a number of reasons for this:

* Split processor heavy work between all available processors to increase performance
* Prevent synchronous method calls from blocking the current thread:
* Web services
* Database Calls
* Prevent calls from the UI appearing unresponsive

ThreadStart & ParameterizedThreadStart

ThreadStart and the ParameterizedThreadStart class provide functionality for executing code within a new thread, with or without parameters.

ThreadStart

ThreadStart allows spawning code in a new thread for a parameterless delegate which returns void. Execution only starts once Start has been called upon the thread.

var aThread = new Thread(new ThreadStart(MyMethod))
{
    Name = "My Thread",     
    Priority = ThreadPriority.Highest
};      
    
aThread.Start();   

ParameterizedThreadStart

ParameterizedThreadStart allows spawning code in a new thread for a delegate which takes one parameter of type object and returns void. The object parameter is passed into the start method.

Where multiple parameters are required, these should be wrapped up in a class and passed in.

var aThread = new Thread (new ParameterizedThreadStart (MyMethod)) {
    Name = "My Thread",     
    Priority = ThreadPriority.Highest
};      

aThread.Start (new object());

Cancelling

A CancellationTokenSource instance can be used to cancel a thread.

CancellationTokenSource can be passed in directly, via a member variable of a user defined params object or simply made accessible to all required threads.

Cancel() is called upon the token if the thread should be cancelled.

var cts = new CancellationTokenSource ();

var aThread = new Thread (new ParameterizedThreadStart (MyMethod)) {
    Name = "My Thread",     
    Priority = ThreadPriority.Highest
};      

aThread.Start (cts.Token);

Thread.Sleep (10000);

cts.Cancel ();

The executing thread should monitor IsCancellationRequested or alternatively call ThrowIfCancellationRequested. ThrowIfCancellationRequested is semantically equal to throwing an OperationCanceledException when IsCancellationRequested returns true;

private void MyMethod (object obj)
{
    CancellationToken token = (CancellationToken)obj;


    for (int i = 0; i < Int32.MaxValue; i++) {    
        // Causes an exception to be raised if cancellation is called
        token.ThrowIfCancellationRequested ();


        // Alternatively check the token manually if cancellation has been raised
        if (token.IsCancellationRequested) {
            throw new OperationCanceledException (token);                
        }
    }
}

Timer Class

System.Threading.Timer allows periodic event signalling via the TimerCallback delegate.

The TimerCallback delegate returns void and takes a single parameter of type object.

A Timer is invoked with the delegate, the param object, the time in milliseconds before starting, and the time in milliseconds between events.

The following calls the method MyMethod with a params object after 2 seconds and every second afterwards:

var t = new Timer(
new TimerCallback(MyMethod), new object() , 2000, 1000); 

CLR ThreadPool

Threads are an expensive resource to spawn and terminate, as such the CLR maintains a pool of worker threads you can use.

All threads in the ThreadPool are background threads with a normal priority.

You can request work from a pool thread with the WaitCallback delegate. It takes one parameter of type object and returns void:

var wcb = new WaitCallback(DoMethod);    

The delegate is assigned via the ThreadPool class:

ThreadPool.QueueUserWorkItem(wcb, new object()); 

ThreadPool can help to ensure that the number of concurrent running threads is kept within a required tolerance, this can ensure that a server does not buckle over from trying to do too many things at one time.

Due to the nature of ThreadPool and the limited number of threads contained within, you can not guarantee the length of time before your process begins.

CancellationTokenSource can be used to cancel an executing thread on the ThreadPool.

CancellationTokenSource cts = new CancellationTokenSource ();
ThreadPool.QueueUserWorkItem (new WaitCallback (DoMethod), cts.Token);
cts.Cancel ();

static void DoMethod (object obj)
{
    CancellationToken token = (CancellationToken)obj;


    for (int i = 0; i < Int32.MaxValue; i++) {    
        token.ThrowIfCancellationRequested ();            
    }
}

Task Parallel Library

Microsoft introduced the Task Parallel Library (TPL) in .NET 4.0.

The library automatically splits up an applications workload across all available CPUs dynamically by requesting worker threads within the CLR thread pool. It provides the classes Parallel and Task.

Parallel

System.Threading.Tasks.Parallel can be used to perform an operation upon all elements in a collection or a custom counter.

The operations can take any of the System.Func or System.Action delegates.

Parallel does not guarantee the order of execution and as such should not be used when the order of execution matters.

Parallel should not be used upon small sets of data or when threads requires resources which require synchronization as this can actually decrease performance.

ForEach

To perform a delegate for each element in a collection:

Parallel.ForEach (new List<int>(), anElement => { 
    /*Some Process */ 
});

For

To perform a countered loop, where the iteration count i is available inside the iteration:

Parallel.For (0, Int32.MaxValue, i => { 
    /* Some Process */ 
});

Break and Stop

ParallelLoopState exposes Stop() and Break(). Break() will cause all iterations with an index less than the callers to finish before the loop is terminated. Stop will simply stop as soon as it is conveniently possible.

Parallel.For (0, Int32.MaxValue, (i, loopState) => { 
    loopState.Stop();
    loopState.Break();
    /*Some Process */ 
});

Invoke

Allows executing a series of operations, the CLR will attempt to do them in parallel.

If the operations modify a shared resource the operation will automatically not be executed in parallel.

Parallel.Invoke( 
    () => { /* Do something #1 */ }, 
    () => { /* Do something #2 */ }, 
    () => { /* Do something #3 */ }, 
    () => { /* o something #4  */ });

Task Class

A simple alternative to asynchronous delegates:

Task.Factory.StartNew (() => {
    DoSomething ();
});  

A task scheduler is responsible for starting and managing tasks. It will by default use threads from the ThreadPool.

The task can be created and started in separate steps:

var aTask = new Task (() => {
    DoSomething ();
});

aTask.Start ();

The wait function can be called to pause the current thread until the task has finished:

aTask.Wait ();

Alternatively the task can be run synchronously:

aTask.RunSynchronously ();

Results can be returned with the Result property. This will block the current thread until the thread has finished executing. Result can only be called on Task:

var aTask = new Task<bool> (() => {
    return true;
});

var result = aTask.Result;

Continuation Task

The ContinueWith method can be used to execute another process as soon as a Task finishes.

Task<bool> t = Task.Run (() => {
    return true;
}).ContinueWith ((x) => {
    return !x.Result;
}); 

var result = t.Result;

ContinueWith can also take a TaskContinuationOptions enum to denote when the method should run; OnlyOnRanToCompletion,OnlyOnFaulted and OnlyOnCanceled

Task<string> t = Task.Run (() => {
    return string.Empty;
}); 

t.ContinueWith ((i) => {
    return "OnlyOnCanceled";
}, TaskContinuationOptions.OnlyOnCanceled); 

t.ContinueWith ((i) => {
    return "OnlyOnFaulted";
}, TaskContinuationOptions.OnlyOnFaulted); 

t = t.ContinueWith ((i) => {
    return "OnlyOnRanToCompletion";
}, TaskContinuationOptions.OnlyOnRanToCompletion); 

var result = t.Result;

Child Tasks

Child tasks can also be assigned when running a Task. Here the parent task will not be deemed as finished until all child tasks have finished. Child Tasks are added with TaskCreationOptions.AttachedToParent.

  Task<int[]> parent = Task.Run (() => {
    var results = new int[2]; 
    new Task (() => results [0] = 0,                     
        TaskCreationOptions.AttachedToParent).Start ();                 
    new Task (() => results [1] = 1, 
        TaskCreationOptions.AttachedToParent).Start ();                       
    return results;
});             

var finalTask = parent.ContinueWith (
                 parentTask => {


        var count = 0;
        foreach (int i in parentTask.Result) {
            count += i;
        }
        return count;              
    });  

finalTask.Wait ();       
var finalResult = finalTask.Result;    

TaskFactory

A TaskFactory can be used to define settings for child tasks:

Task<int[]> parent = Task.Run (() => {

    var results = new int[2];                   


    TaskFactory tf = new TaskFactory (
                         TaskCreationOptions.AttachedToParent,                    
                         TaskContinuationOptions.ExecuteSynchronously);                 

    tf.StartNew (() => 0);                 
    tf.StartNew (() => 1);                 

    return results;
});               

var finalTask = parent.ContinueWith (
                    parentTask => { 
        var count = 0;
        foreach (int i in parentTask.Result) {
            count += i;
        }

        return count;                
    });

finalTask.Wait ();
var finalResult = finalTask.Result;    

WaitAll

WaitAll can be used to wait for all child tasks to finish executing:

var tasks = new Task[3];            
tasks [0] = Task.Run (() => {
    return 1;
});             

tasks [1] = Task.Run (() => { 
    return 2;
}); 

tasks [2] = Task.Run (() => {
    return 3;
}
);

Task.WaitAll (tasks);  

WaitAny

WaitAny causes the current thread to block until any task has completed processing.

Task<int>[] tasks = new Task<int>[3];

tasks [0] = Task.Run (() => {
    return 1;
});

tasks [1] = Task.Run (() => {
    return 2;
});

tasks [2] = Task.Run (() => {
    return 3;
});

int result = 0;

while (result < 6) { 
    int i = Task.WaitAny (tasks);
    var completedTask = tasks [i];                 

    result += completedTask.Result;

    var taskList = tasks.ToList ();
    taskList.RemoveAt (i);

    tasks = taskList.ToArray ();
}

WhenAll

WhenAll can be used to schedule a continuation task to run after all child tasks have finished executing:

var tasks = new Task<int>[3];

tasks [0] = Task.Run (() => {
    return 1;
});

tasks [1] = Task.Run (() => {
    return 2;
});

tasks [2] = Task.Run (() => {
    return 3;
});

int result = 0;
var completionTask = Task.WhenAll (tasks);

completionTask.ContinueWith (x => result += x.Result.Sum( y => y));    

Timing Out A Task

An overload of the WaitAny method takes a maximum wait time. This can be used to cancel a task after a set time period.

var longRunning = Task.Run (() => {
    Thread.Sleep (10000);
    return 1;
}); 

int index = Task.WaitAny (new[] { longRunning }, 1000); 

var result = 0;

if( longRunning.IsCompleted)
    result += longRunning.Result;

Canceling Tasks

Tasks can be called with a CancellationTokenSource.

When cancelling a task with a CancellationTokenSource it will appear to end as if no error has happened. If you require the task to error out upon cancellation then you should throw a OperationCanceledException. An alternative to catching the exception is to use a continuation task with the TaskContinuationOptions.OnlyOnCanceled option.

var cts = new CancellationTokenSource ();
var token = cts.Token;
var isCancelled = false;

Task<bool> task = Task.Run (() => {
    while (!token.IsCancellationRequested) {  
        Thread.Sleep (1000);
    }                       
    throw new OperationCanceledException ();
}, token).ContinueWith ((t) => { 
    return true;
}, TaskContinuationOptions.OnlyOnCanceled);

Thread.Sleep (3000);     

cts.Cancel ();

var result = task.Result;    

Async Keyword

.NET 4.5 now supports the Async keyword, the CLR can call any method asynchronously virtually automatically.

Methods requiring asynchronous calling are marked as async and return either Task or Task depending upon if they return type T or void.

In the following example we have two methods which are tagged as async. They each call a method which returns a Task or Task param.

public async Task<bool> RunWithReturn ()
{
    return await TaskWithReturn ();
}

public async Task RunNoReturn ()
{
    await TaskNoReturn ();
}

private Task<bool>  TaskWithReturn ()
{
    var t = new Task<bool> (() => {
        Thread.Sleep (1);
        return true;
    });

    t.Start ();
    return t;
}

private Task TaskNoReturn ()
{
    var t = new Task (() => {
        Thread.Sleep (1);
        IsSet = true;
    });

    t.Start ();
    return t;
}

These methods returning Task and Task are called as normal. They could be called like:

Task<bool> t1 = TaskWithReturn ();
Task t2 = TaskNoReturn ();

The method is not actually executed until the returned task is called with the await keyword:

var result = await t1;
await t2;

The method call and the await can be wrapped up in one line:

await TaskWithReturn ();
await TaskNoReturn ();

Asynchronous Delegates

Delegates allow asynchronous invocation by BeginInvoke() and EndInvoke()

BeginInvoke() and EndInvoke() are automatically generated by the compiler and have API based upon the parameters and return type of the delegate they were defined against.

BeginInvoke method:

* Takes the the parameters the delegate was defined against
* Takes an optional callback method
* Takes an optional state parameter of type object. This can be cast into any type inside the callback method.
* Returns an IAsyncResult which can help with getting the results back and also with working with the callback

EndInvoke method:

* Returns the same type the delegate was defined against
* Blocks the thread until the the delegate has finished executing

IAsyncResult provides various ways of waiting for the results:

* IsCompleted
* AsyncState
* AsyncWaitHandle
* CompletedSynchronously

IsCompleted

?

Returns true when the delegate has finished running:

delegate  bool AnAction(bool paramA, int param2 );

var d = new AnAction(ActionMethod);   

IAsyncResult ar = d.BeginInvoke(true, 1, null, null);   

do
{  
Thread.Sleep(10000);
 } while(!ar.IsCompleted);   

var result = d.EndInvoke(ar);

AsyncWaitHandle

Wraps up the IsComplete and Sleep from above into one handy function:

var d = new AnAction (DoAction);  
IAsyncResult ar = d.BeginInvoke (true, 1, null, null);   

do {
} while (!ar.AsyncWaitHandle.WaitOne (1000, true));

var result = d.EndInvoke (ar);

AsyncCallback With State

Triggers a callback delegate upon completion.

The callback will be on the same thread as the delegate and not on the main thread.

Here we create a new Foo object which will have all the parameters of the callback wrapped up into one class.

var d = new AnAction(DoAction);  

var ar = d.BeginInvoke(true, 10, 
    new AsyncCallback(MyCallback), 
    new object());

public void MyCallback(IAsyncResult iar) 
{   
    AsyncResult ar = (AsyncResult)iar;   
    var d = (AnAction)ar.AsyncDelegate;
    var result = d.EndInvoke(ar);
    var obj = (object)ar.AsyncState;
} 

Cancelling An Asynchronous Delegate

Cancellation of an executing asynchronous delegate can be performed with the CancellationTokenSource ( See cancelling Tasks).

Parallel LINQ Queries (PLINQ)

PLINQ libraries allow an easy API for running queries in a parallel manner.

PLINQ is requested in code though the CLR will decide if parallel execution will beneficial or not.

System.Linq.ParallelEnumerable contains all the extension methods for PLINQ.

Spawning

Requesting parallel execution is as simple as calling the Parallel method.

Extension methods:

var data = myData.AsParallel().Select( x => x );

Natural methods:

var data = ( from value in myData.AsParallel() select value );

WithDegreeOfParallelism

PLINQ will normally use all available processors to process a query though never more than 64. WithDegreeOfParallelism can be used to restrict the number processors that it it will consume.

var data = 
myData.AsParallel ().WithDegreeOfParallelism(2).Select (x => x);

AsOrdered

PLINQ does not guarantee the order of processing the iterations.

AsOrdered can be called to instruct PLINQ to preserve the ordering within the result set. The query will still be run in parallel.

var data = myData.AsParallel ().Select (x => x).AsOrdered();

AsSequential

AsSequential can be used to force parts of a Linq query to be executed sequentially and not in parallel. Other parts of the query will still be performed in parallel.

var data = myData.AsParallel ().Select (x => x).AsSequential();

ForAll

ForAll allows parallel iteration through a collection. Order is not guaranteed; it will remove any sort order upon a collection and will iterate through each member when the item is available.

var myData = new List<int> () { 1, 2, 3, 4, 5, 6, 7, 8, 9 };

var ints = new List<int> ();

myData.AsParallel ().ForAll (x => ints.Add (x));

Errors

PLINQ will throw an AggregateException when any errors occur within the query when processing. All iterations will be performed; not matter how many errors occur.

Cancelling

The CancellationTokenSource can be used to cancell an executing PLINQ query. This is passed in with the WithCancellation method:

var cs = new CancellationTokenSource ();

Extension methods:

var myData = new List<int> () { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
var data = myData.AsParallel ().
          Select (x => x).WithCancellation (cs.Token);

Natural Linq:

var myData = new List<int> () { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
var data =
    (from Value in myData.AsParallel ().WithCancellation (cs.Token)
     select Value);

On the original thread you can call Cancel upon the token:

cs.Cancel ();

WithExecutionMode

WithExecutionMode can be used for force PLINQ into a parallel query though this should be used with care.

myData.AsParallel ().Select (x => x).
WithExecutionMode (ParallelExecutionMode.ForceParallelism);

File IO

In high latency environments, asynchronous file IO can provide performance gains by releasing threads until the file system responds:

public async Task CreateAndWriteAsyncToFile (byte[] data, string aFileName)
{
    using (FileStream stream = new FileStream (aFileName, 
                                   FileMode.OpenOrCreate,         
                                   FileAccess.Write, 
                                   FileShare.Read, 1024 * 4, true)) {         
        await stream.WriteAsync (data, 0, data.Length);     
    }
}

public async Task<byte[]> ReadFileAsync (string aFileName, int length)
{
    byte[] data = new byte[length];


    using (FileStream stream = new FileStream (aFileName, 
                                   FileMode.Open,         
                                   FileAccess.Read, 
                                   FileShare.Read, 1024 * 4, true)) {         
        await stream.ReadAsync (data, 0, data.Length);     
    }

    return data;
}

Networking

Asynchronous network downloads can be achieved with the GetStringAsync method.

The GetStringAsync uses asynchronous code internally and returns a Task to the caller that will finish when the data is retrieved:

Seeing as GetStringAsync returns a Task instance WhenAll can be used to register a callback when multiple instances are returned:

public async Task<string> ReadAsyncHttpRequest ()
{     
    HttpClient client = new HttpClient ();     
    return await client.GetStringAsync ("http://www.google.com");
}

HttpClient can be found within the System.Net.Http namespace which is found within the assembly System.Net.Http.dll.

public async Task<string> ExecuteMultipleRequestsInParallel ()
{     
    HttpClient client = new HttpClient ();     

    Task<string> one = client.GetStringAsync ("http://www.google.co.uk/");     
    Task<string> two = client.GetStringAsync ("http://monodevelop.com/");     

    await Task.WhenAll (one, two); 

    return one.Result + two.Result;
}

Thread Synchronization

Source Code

All source code can be found on GitHub here.

This is part of my HowTo in .NET seriies. An overview can be seen here,


Concurrency

Concurrency exists due to multiple threads touching the same resource at the same time.

A dirty read occurs when the state of one entity is read by one thread while another thread has started but not finished changing it.

Dirty reads can be prevented by locking regions of code to be accessed by only one thread at a time. This process is called thread synchronisation and unfortunately can lead to deadlock and resource starvation.

Deadlock occurs when two threads have exclusive access to a synchronised resource that is required by the other; it is a stalemate situation.

Resource starvation occurs when one thread is continuously denied access to a resource it is trying to access.

Thread synchronization prevents dirty reads. There are many ways to achieve this in .NET.

Lock Regions

The lock region defines a region of code where only one thread can access at a time.

public class LockRegion
{
    private System.Object lockControl = new System.Object ();

    public void AMethod ()
    {
        lock (lockControl) {
            // Access to this section permitted to only one thread at a time.
            // read or set shared state here etc.
        }
    }
}

The parameter passed into the lock method defines the scope of thread synchronisation. It can be any reference type. To avoid deadlock it is advised to avoid:

* Using the same lock objects between threads
* Public members and any entities which will not be defined as unique
* Do not lock on ‘this’; it could be used by code external to your class.
* Strings should not be used as lock objects due to string-interning. This is where one string is only actually generated internally for multiple string instances with the same value.

A thread is paused if it cannot get unique access to a region of code; i.e If another thread is running in the region. It will be resumed when the region is free where it will try again to get unique access.

ThreadInterruptedException is thrown if a thread is interrupted while waiting to enter a lockregion.

Monitor Class

The Lock keyword is a shortcut to System.Thread.Monitor class. Upon entering a lock region Monitor.Enter() is called while Monitor.Exit() is called upon leaving.

The lock example previously could be rewritten with the monitor class:

      
private System.Object lockControl = new System.Object ();

public void LockRegion ()
{
    try {

        Monitor.Enter (lockControl);

        try {

            // Access to this section permitted to only one thread at a time.
            // Read or set shared state here etc. 
        } finally {
            Monitor.Exit (lockControl);
        }
    } catch (SynchronizationLockException ex) {
    }
}

Accessing Monitor directly allows greater control over what happens to a thread when requesting or being denied unique access to a lock region.

TryEnter

Monitor.TryEnter() returns a boolean indicating if the request for lock was successfully acquired. This can be useful to prevent the thread being paused when access is denied, such as is the case with lock and Monitor.Enter. It has an override for the maximum wait time in milliseconds before it returns false:

public void TryEnterExampple ()
{
    if (Monitor.TryEnter (lockControl, 100)) {
        try {
        } finally {
            Monitor.Exit (lockControl);
        }     
    } else {
        // Required logic for failure to get unique access.    
    }
}

Pulse & Wait

The Monitor.Wait() method allows a thread to release unique access to a locked region of code while it is still inside. The thread will be paused and another thread will gain unique access. Once the other thread has released the lock the paused thread is free to try and gain a unique lock again.

Wait provides an override which can define the minimum amount of time in milliseconds it will wait before trying to get unique access once again; i.e the minimum time it will vacate the region.

Pulse and PulseAll can be called by a thread which has unique access to a locked region. It does not pause the thread but rather signals that it is about to leave the locked region.

Interlocked Class

System.Threading.Interlocked allows synchronization of threads by exposing a series of atomic operations which operate on numeric member fields.

Increment, decrement and addition operations:

Interlocked.Increment (ref aNumber);

Interlocked.Decrement (ref aNumber);

Interlocked.Add (ref aNumber, 10);

The exchange function can be used to swap two entities:

Interlocked.Exchange (ref usingResource, 1);

The exchange function can be used to provide a manual lock mechanism:

private int usingResource = 0;

if (Interlocked.Exchange (ref usingResource, 1) == 0) {     
    // unlock by setting back to 0
    Interlocked.Exchange (ref usingResource, 0); 
}    

ThreadStatic

The ThreadStatic attribute can be used to mark member fields as unique to a thread:

[ThreadStatic] static double previous = 0.0;

ThreadLocal

.NET Framework 4 offers a ThreadLocal.

Local variables defined with ThreadLocal will be not only be unique and private to each thread, they are also initialized lazily upon first consumption within each thread.

var threadLocal = new ThreadLocal<int> (() => {
    return 1;
});

var isIsValueCreated = threadLocal.IsValueCreated;
var value = threadLocal.Value;

Synchronization Attribute

The Synchronization Attribute allow automatic thread synchronization at class or method level.

The Synchronization Attribute can be found within the namespace System.Runtime.Remoting.Contexts.

The CLR places the class into its own AppDomain Context, though you need to inherit from ContextBoundObject:

[Synchronization] 
public class SynchronizationAttributeExample : ContextBoundObject
{
}

Thread synchronization is handled by the CLR however there is not much control as all the code within the scope of the attribute is synchronized which is not good for performance.

Thread.Join

Thread.Join can be used to pause a thread until a spawned thread has finished executing. The calling thread will be blocked until a spawned thread terminates, if it has already terminated it will return immediately.

Thread t1 = new Thread (() => { 
    Thread.Sleep (1000);
});

Thread t2 = new Thread (() => { 
    Thread.Sleep (2000);
});

t1.Start ();
t2.Start ();

t1.Join ();
t2.Join ();

AutoResetEvent

AutoResetEvent allows coordinating multiple threads by pausing and resuming them. This can allow them to synchronise their work.

The AutoResetEvent class can coordinate a main thread and a new thread by passing signals between the two threads.

Calling WaitOne on the secondary thread will pause the thread until the main thread has signalled with the Set method.

Set does not cause the main thread to pause but rather allows the secondary thread to resume.

private AutoResetEvent autoResetEvent = new AutoResetEvent (false);

void Main ()
{
    var aThread = new Thread (
                      new ThreadStart (OtherThreadMethod));

    aThread.Start ();
    autoResetEvent.Set ();        

    // some waiting logic would be in here
    Thread.Sleep (100);

    autoResetEvent.Set ();
}

void OtherThreadMethod ()
{
    autoResetEvent.WaitOne (); // pauses until set is called by main thread
    autoResetEvent.WaitOne ();
}

AutoResetEvent takes a boolean which determines if the event starts of as signaled (having Set called). If true the secondary thread will execute immediately, if false it will not start executing until Set is called.

Processes, Threads, AppDomains And Object Contexts

Source Code

All source code can be found on GitHub here.

This is part of my HowTo in .NET seriies. An overview can be seen here,

Processes

Any code executing on a windows platform is wrapped up in a process.

Each process is given a unique PID which can be seen within the Task Manager.

If the code entity being run is determined to be a .NET assembly, an instance of the CLR is created to run the assembly.

Each instance of the CLR receives its own managed heap.

Prior to .NET V4 a process could only host one instance of the CLR. Though this has now been changed, each instance of the CLR still receives its own instance of a managed heap memory allocation.

Each .NET process is run in isolation from other processes; they cannot access each other’s resources such as memory. The separation means that if one process errors it does not bring down other running processes.

Querying Running Processes

The Process class provides a number of static methods for retrieving running processes on either a local or remote computer.

Get all locally running processes on a local machine:

Process[] pids = Process.GetProcesses();

Get all instances of a running application by its name:

Process [] pids = Process.GetProcessesByName("MyApp");

Get all instances of a running application on a remote machine either by the machine name or by IP address:

Process [] pids = Process.GetProcessesByName("MyApp", "Server");


Process [] pids = Process.GetProcessesByName("MyApp", "1.2.3.4");

Get a process by its process id:

Process aProcess = Process.GetProcessById(123);

Retrieve the process name of a retrieved Process instance:

var aName = aProcess.ProcessName;

Starting & Killing Processes

A process can be killed by simply calling Kill on the process instance:

aProcess.Kill ();

The Process class offers the static Start method to start a process by an executable name.

Process.Start("AnApp.exe");

Command line parameters can be passed to the application:

Process.Start("AnApp.exe", "parameter");

Process can take a number of configurations:

Process.Start (new ProcessStartInfo ("AnApp.exe", "AParameter") { 
    CreateNoWindow = true, 
    WorkingDirectory = @"c:\" 
});

Threads

A thread is the smallest sequence of instructions that can be independently managed by an operating system and therefore processed by a processor.

Before the luxury of multi core systems only one thread could be executed at a time. Multiple threads would have to share processor time. Concurrent thread execution would seemingly be achieved by continuously pausing and resuming each running thread on the system.

In todays world of multi core systems multiple threads can genuinely run simultaneously, however seeing as there can be more threads than processors, the same principle holds true.

With multiple threads running on a computer, the processor will give more processor time to threads marked as having a higher priority.

When a Process is started, a default thread is created or spawned which will execute the code associated with the assembly.

Each thread receives it’s call stack; all local variables are placed on the stack and as such are private to the thread.

Thread Members

Code can get access to the thread that is running it with the CurrentThread property:

var aThread = Thread.CurrentThread;

Access to all threads running within a process via the Threads property of a Process instance:

var threads = Process.GetProcesses () [0].Threads;

Determine if the thread is still alive; that it has been started and has not terminated normally or aborted:

var isAlive = aThread.IsAlive;

The name of the thread can be set and retrieved with the name property. This can be helpful to identify which thread is which; the name is shown within the Threads debug window:

aThread.Name = "Thready McFreddy";
var aName = aThread.Name;

Foreground & Background Threads

The CLR has the concept of foreground and background threads.

Foreground Threads can prevent the CLR terminating or unloading an AppDomain.

Background threads can not prevent the CLR from terminating.

A thread can be switched between foreground and background with the IsBackground property:

aThread.IsBackground = true;

Thread Priority

A thread’s priority can be changed to request more or less processing time from the CLR. The higher the priority the more time a thread will be given from a processor. The default priority is normal:

* Lowest
* BelowNormal
* Normal
* AboveNormal
* Highest

To get a threads priority:

var priority = aThread.Priority;

The thread priority can be changed at any point in the threads existence.

aThread.Priority = ThreadPriority.Highest;

Thread State

The state of a thread can be determined via the ThreadState property which returns a member of the ThreadState enum:

var threadState = aThread.ThreadState;

Name
Description
Aborted
The thread has been aborted or requested to be aborted . The thread is dead however its status has not changed to stopped.
AbortRequested
The thread has been requested to abort by Thread.Abort method but it has not received the System.Threading.ThreadAbortException which will terminate it.
Background
The thread is a background thread.
Running
The thread is running, is not blocked and no attempt to terminate has been made.
Stopped
The thread has been completely terminated or stopped.
StopRequested
A request to stop the thread has been made.
Suspended
The thread is suspended.
SuspendRequested
A request to suspend the thread has been made.
Unstarted
The thread has not been started via the Thread.Start method.
WaitSleepJoin
The thread is paused due to either a Thread.Sleep or because a request to get exclusive access to a locked region has been made.

Thread Abort

Thread.Abort can be used to stop an executing thread. It will cause a ThreadAbort-Exception to be thrown. Caution should be used when calling this method as it can leave a thread and it’s data in a corrupt state.

Application Domains

An application domain is a segmentation within a process; it provides a level of isolation and security within each process.

AppDomains abstract the OS from the executing code.

Threads are actually associated with a single AppDomain and not a Process.

The CLR creates a default AppDomain and its default Thread as part of executing a .NET assembly.

A Process can contain multiple AppDomains and AppDomains can contain multiple Threads; however these will only exist if if they are created manually by executing code within the Process.

AppDomains are less expensive than Processes; the CLR can load or unload AppDomains quicker than a processes can be created or destroyed.

When an Assembly is loaded it is loaded into an AppDomain. Code within an assembly can only be executed within the AppDomain it was loaded into.

A thread can only run inside one AppDomain at a time though it can be moved between AppDomains when it is seen fit by the CLR, this is to allow thread reuse. A thread can only execute code associated to the AppDomain it is currently running in.

AppDomain Members

Executing code can get access to the domain it is running in via the CurrentDomain property:

AppDomain aAppDomain = AppDomain.CurrentDomain;

An AppDomain instance exposes properties describing itself:

var id = aAppDomain.Id;

var name = aAppDomain.FriendlyName;

var isDefault = aAppDomain.IsDefaultAppDomain ();

var dir = aAppDomain.BaseDirectory;

GetAssemblies lists all .NET assemblies which an AppDomain has loaded into it:

Assembly[] assemblies = AppDomain.CurrentDomain.GetAssemblies (););

Creating & Loading AppDomains

AppDomains can be created:

var aDomain = AppDomain.CreateDomain("MyDomain"); 

Loading and executing assemblies into AppDomains:

AppDomain.Load ("Foo");

AppDomain.ExecuteAssembly ("Foo.exe");

An AppDomain allows subscribing to the AssemblyLoad event which is triggered when an assembly is loaded into it:

aaAppDomain.AssemblyLoad += (o, s) => {
            };   

Unloading AppDomains

Individual assemblies can not be unloaded though AppDomains can:

AppDomain.Unload(aAppDomain);

An AppDomain allows subscribing to the DomainUnload event which is triggered when it is being unloaded:

aAppDomain.DomainUnload += (o, s) => { };    

Process.ProcessExit event is triggered when the Process is exited:

aAppDomain.ProcessExit += (o, s) =>   { };

Object Context Boundaries

An AppDomain splits itself into contexts. The CLR groups types into groups depending upon their context needs.

Most types are placed into the default context; this is referred to as context 0 and the objects contained within as context-agile objects.

CLR creates new contexts when a loaded type requires a new context boundary. Objects requiring thread synchronization typically require this process. This is covered in the the multithreading section.

Creating a Context Bound Object

Objects requiring special context needs must derive from System.ContextBoundObject base.

public class ContextBoundObjectExample : ContextBoundObject
{
}

Inspecting An Object’s Context

var context = Thread.CurrentContext;

var contextId = context.ContextID;

var contextProperties = context .ContextProperties;

var propname = contextProperties[0].Name;