Wednesday, January 25, 2012

CsvHelper

In the last six months I've been writing a couple of tools that reads some data from a csv file, calculate statistics on them and output the results to another csv file. The input file was usually generated by a website (AdWords/AdSense and so...) and the output file was supposed to be consumed by an sql bulk insert or just for view by excel.

All these requirement needed that I would be able to read and write proper csv files.

Using Josh Close's CsvHelper made this mission really simple. For example this is a basic reading code:

public IEnumerable<Input> Read(string fileName)
{
using(var fileReader = File.OpenText(fileName))
using(var csvReader = new CsvHelper.CsvReader(fileReader))
{
while (csvReader.Read())
{
var record = csvReader.GetRecord<Input>();
yield return record;
}
}
}

Notice the elegant ORM, just define the columns you wish to read in a data class and the reader will do all the mapping for you (If you wish to map the data yourself it is possible, and I even used it once for a generic reader, but why?).

And here is a writing example:

public void Write(string fileName, IEnumerable records)
{
using (var textWriter = File.CreateText(fileName))
using (var writer = new CsvWriter(textWriter))
{
foreach (var record in records)
{
writer.WriteRecord(record);
}
}
}


You can even control the column name and index with the CsvFieldAttribute:


public class Record
{
[CsvField(Name="very important data")]
public string SomeData { get; set; }

[CsvField(Index = 4)]
public int SomeNumericData { get; set; }
}


Cool, isn't it?

There are some things I wish to point:
1. CsvReader doesn't need to know about all the columns in the file, only the ones you wish to read (ActiveRecord, try to learn from this...:-)).
2. In the past encoding was not controled by the consumer of the lib. We had a problem with files comming from foreign servers with different encoding, but the CsvReader was using CultureInfo.CurrentCulture. Luckily this is no longer so, you can specify "UseInvariantCulture" in CsvConfiguration.
3. CsvWriter is always adding a NewLine at the end of each file. Because of our need in bulk insert validation, we needed to remove this line. Since I had some time back then I wroting this simple decorater over TextWriter, that only flushs the last line if it's not empty:

public class CustomTextWriter : TextWriter
{
private readonly TextWriter baseWriter;
private string lastLine;
private bool disposed;

public CustomTextWriter(string fileName)
{
baseWriter = new StreamWriter(fileName, false, Encoding.Unicode);
disposed = false;
}

public override Encoding Encoding
{
get { return Encoding.Unicode; }
}

public override void WriteLine(string value)
{
if (lastLine != null)
baseWriter.WriteLine(lastLine);
lastLine = value;
}

protected override void Dispose(bool disposing)
{
if(disposed)
return;
baseWriter.Write(lastLine);
baseWriter.Dispose();
base.Dispose(disposing);
disposed = true;
}
}


Well, I hope this has been helpful and that you might use it your self :-). If you do, you are more then invited to share in the comments section.

4 comments:

  1. Your CustomTextWriter class is cool, but how do I use this with CsvHelper? Do I have to edit the source and recompile?

    ReplyDelete
  2. This is what I was looking for .
    Thanks

    ReplyDelete