Author:
For the Python programming language, Pandas is an efficient and popular data analysis tool, especially its Dataframe, used to manipulate and display data. For the .NET programming languages, we can use Deedle or Microsoft.Data.Analysis package available in Nuget which also provides a DataFrame class used to manipulate, transform, and display data.
This example focuses on Microsoft.Data.Analysis package by demonstrating some basic features of the DataFrame class in Jupyter Notebook.
It also uses the XPlot.Plotly package which is F# data visualization package to plot charts for the data in the Dataframe.
Prerequisite
To run examples in this article, please refer to this Using .NET Core in Jupyter Notebook article for setting up Jupyter Notebook to support the .NET programming languages
Install the package
The Microsoft.Data.Analysis package is available in Nuget so the dotnet-interactive #r magic command can be used to install the package from Nuget.
Run the below command to install Microsoft.Data.Analysis package version 0.4.0.
#r "nuget:Microsoft.Data.Analysis,0.4.0"
Refer the namespaces
This article uses classes from the following four packages. Therefore, it uses the using statement to refer to those packages.
- XPlot.Plotly: A cross-platform data visualization package for the F# and .NET programming languages
- Microsoft.Data.Analysis: An easy-to-use and high-performance libraries for data analysis and transformation
- System.Linq: Classes and interfaces that support queries that use Language-Integrated Query
- Microsoft.AspNetCore.Html: Types for manipulating HTML content
using XPlot.Plotly;
using Microsoft.Data.Analysis;
using System.Linq;
using Microsoft.AspNetCore.Html;
This can be overridden by registering the custom formatter for the DataFrame. The below code registers custom formatters for the Dataframe and DataFrameRow to render the data in an HTML table.
It only displays the first 100 rows. This can be changed by modifying the value of the take variable.
Formatter<DataFrame>.Register((df, writer) =>
{
var headers = new List<IHtmlContent>();
headers.Add(th(i("index")));
headers.AddRange(df.Columns.Select(c => (IHtmlContent) th(c.Name)));
var rows = new List<List<IHtmlContent>>();
var take = 100;
for (var i = 0; i < Math.Min(take, df.Rows.Count); i++)
{
var cells = new List<IHtmlContent>();
cells.Add(td(i));
foreach (var obj in df.Rows[i])
{
cells.Add(td(obj));
}
rows.Add(cells);
}
var t = table(
thead(
headers),
tbody(
rows.Select(
r => tr(r))));
writer.Write(t);
writer.Write(df.Rows.Count + " x "+df.Columns.Count);
}, "text/html");
Formatter<DataFrameRow>.Register((dataFrameRow, writer) =>
{
var cells = new List<IHtmlContent>();
cells.Add(td(i));
foreach (var obj in dataFrameRow)
{
cells.Add(td(obj));
}
var t = table(
tbody(
cells));
writer.Write(t);
}, "text/html");
Create the DataFrame
DataFrameColumn
A DataFrame can be created by passing the list of DataFrameColumn objects to the DataFrame's constructor.
public DataFrame(params DataFrameColumn[] columns); public DataFrame(IEnumerable<DataFrameColumn> columns); |
The following code creates a DataFrame that has 200 rows and 2 columns. The first column contains dates and the second column contains random integer numbers. It calls the PrimitiveDataFrameColumn constructor to create the DataFrameColumn instances.
var start = new DateTime(2009,1,1);
Random rand = new Random();
var numDataPoint = 200;
PrimitiveDataFrameColumn<DateTime> date = new PrimitiveDataFrameColumn<DateTime>("Date",
Enumerable.Range(0, numDataPoint)
.Select(offset => start.AddDays(offset))
.ToList());
PrimitiveDataFrameColumn<int> data = new PrimitiveDataFrameColumn<int>("Data",
Enumerable.Range(0, numDataPoint)
.Select(r => rand.Next(100))
.ToList());
var df = new DataFrame(date, data);
df
CSV File
The DataFrame can also be created from a CSV file by calling the DataFrame.LoadCsv static method.
The following code creates a DataFrame from the ohcldata.csv file. This file is downloaded from 5.30. Example - Basic OHLC (Open, High, Low, Close) Financial Plot website. This file contains the daily Open, High, Low, Close financial data.
var df1 = DataFrame.LoadCsv("ohlcdata.csv");
df1
Then, the Info method can be used to generate a summary of each column in the DataFrame.
df1.Info()
df[0,1]
After that, a new value can be assigned to the DataFrame.
The below code increases the data in the first row and the second column by 10.
df[0,1] = int.Parse(df[0,1].ToString()) + 10;
df.Head(10)
df1.Rows[9]
The column index can also be used to access the specific column in the row.
The below accesses the fourth column in the tenth row.
df1.Rows[9][5]
Then, the new value can also be assigned to the column.
The below code assigns 50000000 to the sixth column.
df1.Rows[9][5] = 50000000f;
df1.Head(10)
//df.Columns["Data"] or df.Columns[1]
df.Columns["Data"]
The data in the column can be changed by using the DataFrame's overloaded operators.
The below code increases all data in the column by ten.
df.Columns["Data"]= df.Columns["Data"]+10;
df
df.Columns.Add(new PrimitiveDataFrameColumn<int>("Data1", df.Rows.Count()));
df
The data in the new column is set to null.
The following code fills null values in the new column (Data1) with 10.
df.Columns["Data1"].FillNulls(10, true);
df
df.Append(new List<KeyValuePair<string, object>>() {
new KeyValuePair<string, object>("Date", DateTime.Now),
new KeyValuePair<string, object>("Data", 12),
new KeyValuePair<string, object>("Data1", 50)
}, true);
df.Tail(10)
df.OrderBy("Data")
var groupByData = df.GroupBy("Data");
groupByData.Count().OrderBy("Data")
df.Filter(df.Columns["Data"].ElementwiseGreaterThan(50))
Merge the DataFrame
The Merge method can be used to merge two DataFrames with a database-style join.
The following code joins two DataFrames by using the Date column contained in both DataFrames. First, it converts the data type in the Date column of the df1 from the string type to the DataTime type. Then, it calls the Merge method to join the DataFrames.
df1.Columns["Date"] = new PrimitiveDataFrameColumn<DateTime>("Date",
df1.Columns["Date"]
.Cast<object>()
.ToList()
.Select(x => DateTime.ParseExact(x.ToString(), "yyyy-MM-dd", System.Globalization.CultureInfo.InvariantCulture))
.Cast<DateTime>());
df1.Merge<DateTime>(df, "Date", "Date")
Plot charts by using XPlot.Ploty
XPlot.Ploty is a cross-platform data visualization package for the F# and .NET programming languages. It is based on Plotly which is the popular JavaScript charting library.
The following examples demonstrate how to use XPlot.Ploty to plot charts by using the data in the DataFrame.
Line chart
The following code plots a line chart from the Open column in the DataFrame.
var chart1 = Chart.Plot(
new Graph.Scatter
{
x = df1.Columns["Date"],
y = df1.Columns["Open"],
mode = "lines+markers"
}
);
var chart1_layout = new Layout.Layout{
title="Open Price",
xaxis =new Graph.Xaxis{
title = "Date"
},
yaxis =new Graph.Yaxis{
title = "Price (USD)"
}
};
chart1.WithLayout(chart1_layout);
chart1
Line Chart with Mulitple Lines
The following code plots the Open, and Close columns in a line chart.
var chart2_list = new List<Graph.Scatter>
{
new Graph.Scatter
{
x = df1.Columns["Date"],
y = df1.Columns["Open"],
name="Open",
mode = "lines"
},
new Graph.Scatter
{
x = df1.Columns["Date"],
y = df1.Columns["Close"],
name="Close",
mode = "lines"
}
};
var chart2 = Chart.Plot(
chart2_list
);
var chart2_layout = new Layout.Layout{
title="Open and Close Price",
xaxis =new Graph.Xaxis{
title = "Date"
},
yaxis =new Graph.Yaxis{
title = "Price (USD)"
}
};
chart2.WithLayout(chart2_layout);
chart2
var chart3 = Chart.Plot(
new Graph.Bar
{
x = df1.Columns["Date"],
y = df1.Columns["Volume"],
marker = new Graph.Marker{color = "rgb(0, 0, 109)"}
}
);
var chart3_layout = new Layout.Layout{
title="Volume",
xaxis =new Graph.Xaxis{
title = "Date"
},
yaxis =new Graph.Yaxis{
title = "Unit"
}
};
chart3.WithLayout(chart3_layout);
chart3
var chart4 = Chart.Candlestick(df1.OrderBy("Date").Rows.Select(row => new Tuple<string, double, double, double, double>(
((DateTime)row[0]).ToString("yyyy-MM-dd"),
double.Parse(row[1].ToString()),
double.Parse(row[2].ToString()),
double.Parse(row[3].ToString()),
double.Parse(row[4].ToString())
)));
chart4.WithLayout(new Layout.Layout{
title="OHLC",
xaxis =new Graph.Xaxis{
title = "Date"
},
yaxis =new Graph.Yaxis{
title = "Price (USD)"
}
});
chart4
References
- Phplot.sourceforge.net. n.d. 5.30. Example - Basic OHLC (Open, High, Low, Close) Financial Plot. [online] Available at: http://phplot.sourceforge.net/phplotdocs/ex-ohlcbasic.html [Accessed 6 May 2021].
- Bluemountaincapital.github.io. n.d. Deedle: Exploratory data library for .NET. [online] Available at: https://bluemountaincapital.github.io/Deedle/ [Accessed 6 May 2021].
- Govindarajan, P., 2019. An Introduction to DataFrame | .NET Blog. [online] .NET Blog. Available at: https://devblogs.microsoft.com/dotnet/an-introduction-to-dataframe/ [Accessed 6 May 2021].
- Sequeira, J., 2020. dotnet/interactive: Magic Commands. [online] GitHub. Available at: https://github.com/dotnet/interactive/blob/main/docs/magic-commands.md [Accessed 6 May 2021].
- Winnington, E., 2019. Eric Winnington - Tips and tricks for C# Jupyter notebook. [online] Ewinnington.github.io. Available at: https://ewinnington.github.io/posts/jupyter-tips-csharp [Accessed 6 May 2021].
- Fslab.org. n.d. XPlot - F# Data Visualization Package. [online] Available at: https://fslab.org/XPlot/index.html [Accessed 6 May 2021].
- Phuriphanvichai, J., 2021. Using .NET Core in Jupyter Notebook | Refinitiv Developers. [online] Developers.refinitiv.com. Available at: https://developers.refinitiv.com/en/article-catalog/article/using--net-core-in-jupyter-notebook.html [Accessed 10 May 2021].