'Read .csv File into Multiple database tables while filling the dependencies between them
I'm doing a project(ASP.Net Razor Pages) and I'm really stuck. So I have this method that reads a .csv file looking like this:
| Id | Foods | Categories | Actions |
|---|---|---|---|
| 1 | Acerola | Fruit | "DNA Protection, Immunity" |
| 2 | Almond Butter | Sweets | DNA Protection |
| 3 | Almonds | "Legumes, Nuts, Beans" | "Angiogenesis Antiangiogenic, DNA Protection" |
| 4 | Aged Garlic | Vegetables | Immunity |
| 5 | Anchovies | Vegetables | "Angiogenesis Antiangiogenic, Regeneration, DNA Protection" |
Before each "," it's a Table in the database, and the last column accepts multiple values.
The view:
...
<form asp-action="UploadFile" asp-controller="Import" method="post" enctype="multipart/form-data">
<div class="for-group">
<label class="form-label">Upload file</label>
<input name="file" type="file" class="form-control"/>
<button type="submit" class="btn-primary btn-primary">Upload</button>
</div>
</form>
...
The "Controller"
public void UploadtoDb(IFormFile file)
{
var stream = file.OpenReadStream();
using (var reader = new StreamReader(stream))
using (CsvReader csvReader = new CsvReader(reader, CultureInfo.InvariantCulture))
{
var categoryRecords = new List<Category>();
var ActionsRecords = new List<Actions>();
var foodRecords = new List<Food>();
var csv = csvReader.Read().ToString().Split(',');
csvReader.ReadHeader();
csvReader.Configuration.BadDataFound.Equals(true);
while (csvReader.Read())
{
var catRecords = new Category
{
CategoryName = csvReader.GetField("Categories")
};
categoryRecords.Add(catRecords);
var fRecords = new Food
{
FoodName = csvReader.GetField("Foods")
};
foodRecords.Add(fRecords);
foodRecords.Select(x => x.FoodName).Distinct();
var actionsRecords = new Actions
{
ActionName = csvReader.GetField("Actions")
};
ActionsRecords.Add(actionsRecords);
ActionsRecords.Select(x => x.ActionName).Distinct();
}
categoryRecords.Select(x => x.CategoryName).Distinct();
var uniqueCategories = categoryRecords.GroupBy(p => p.CategoryName)
.Select(grp => grp.First())
.ToArray();
foreach (var category in uniqueCategories)
{
_ctx.Categories.Add(category);
_ctx.SaveChanges();
}
foreach (var food in foodRecords)
{
_ctx.Foods.Add(food);
_ctx.SaveChanges();
}
foreach (var action in ActionsRecords)
_ctx.Actions.Add(action);
_ctx.SaveChanges();
}
Wherever I get to the foods, it asks me for the categoryId which is from the Categories table/Third column in the file.
I also tried with some queries and Linq i.e: food.CategoryId = _ctx.Categories.FirstOrDefault(x=>x.Id=food.Category.Id)
Pictures of the errors: enter image description here enter image description here
GiTHubCli code: gh repo clone joaoFerreiragHub/ProjetoFinal
Link: https://github.com/joaoFerreiragHub/ProjetoFinal.git
Is there a way that CsvHelper can help me out with sorting this into the tables?
Solution 1:[1]
If you want to do this quickly, the best way would be to upload the CSV file "as is" into a staging table, say "StagingFoods".
CREATE TABLE StagingFoods
(
Id int NOT NULL,
Foods nvarchar(250) NULL,
Categories nvarchar(250) NULL,
Actions nvarchar(1000) NULL,
CONSTRAINT PK_StagingFoods PRIMARY KEY CLUSTERED (Id)
)
Unfortunately, EntityFramework doesn't have a good way to do bulk inserts, but you could try a nuget third party extension like EFCore.BulkExtensions.
var records = csv.GetRecords<StagingFoods>();
_ctx.BulkInsert(records);
Then use straight SQL to create your Categories, Foods and Actions. You can use the SQL in your EntityFramework code using ExecuteSqlRaw. If you have a lot of data, this will be a lot faster than inserting them one at a time.
Category
_ctx.Database.ExecuteSqlRaw(@"
Insert into
Category(CategoryName)
select
Categories
from
StagingFoods sf
left join Category c
on sf.Categories = c.CategoryName
where
c.Id is null
group by
Categories
");
Food
_ctx.Database.ExecuteSqlRaw(@"
Insert into
Foods(FoodName,CategoryId)
select
sf.Foods
,c2.Id
from
StagingFoods sf
left join Foods f
join Category c1
on f.CategoryId = c1.Id and sf.Categories = c1.CategoryName
on sf.Foods = f.FoodName
join Category c2
on sf.Categories = c2.CategoryName
where
f.Id is null
group by
Categories
,Foods
");
I'll let you figure out how to do Actions.
When you are finished inserting all of the data in the tables, you can truncate your staging table.
_ctx.Database.ExecuteSqlRaw("TRUNCATE TABLE StagingFoods");
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
