Interesting benchmark: Reading lines from a text file

The obvious way to read lines from a text file is simply to open it and call ReadLine until you get nothing back:


StreamReader r = File.OpenText(fileName); while (true) { string thisLine = r.ReadLine(); if (thisLine == null) break; // Process the line here } r.Close();


Reading a million lines from a text file this way, on my system, takes about 1130 milliseconds. Not bad, but I was wondering whether it would be faster to read from a memory buffer instead of from a file, so I tried this test:


r = File.OpenText(fileName); string allLines = r.ReadToEnd(); StringReader sr = new StringReader(allLines); while (true) { string thisLine = sr.ReadLine(); if (thisLine == null) break; // Process the line here } sr.Close(); r.Close();


Interestingly enough, this takes over 200 milliseconds longer (around 1380 milliseconds). I wonder why. You'd think that doing a single read into a buffer would be cheaper than doing a lot of smaller reads (my test file is megabytes in size, so it wouldn't be buffering the entire file in the first test). Turns out that in the second test, over 700ms is spent in the r.ReadToEnd call. Best I can figure ReadToEnd is doing character set conversions or something to the whole buffer as it's reading it. To eliminate that, I tried reading into a memory buffer and then extracting the strings from that:


FileStream fs = File.OpenRead(fileName); byte[] memBuffer = new byte[fs.Length]; fs.Read(memBuffer, 0, memBuffer.Length); fs.Close(); StreamReader sr = new StreamReader(new MemoryStream(memBuffer)); endms = (DateTime.Now.Second * 1000 + DateTime.Now.Millisecond); Console.WriteLine(" took {0}", endms - startms); while (true) { string thisLine = sr.ReadLine(); if (thisLine == null) break; // Process the line here } sr.Close();


The end result? It's still slower than the first method, although by only a small amount (about 40 milliseconds slower). The moral of this story: The obvious way of reading lines from a text file is also the fastest. It probably got the most attention during optimization since that's what most people would use.


Note to self:  Edit posts using Internet Explorer.  Posting them with Firefox for some reason causes them to lose all their formatting.  Doh!