'Read .csv File into Multiple database tables while filling the dependencies between them

I'm doing a project(ASP.Net Razor Pages) and I'm really stuck. So I have this method that reads a .csv file looking like this:

Id Foods Categories Actions
1 Acerola Fruit "DNA Protection, Immunity"
2 Almond Butter Sweets DNA Protection
3 Almonds "Legumes, Nuts, Beans" "Angiogenesis Antiangiogenic, DNA Protection"
4 Aged Garlic Vegetables Immunity
5 Anchovies Vegetables "Angiogenesis Antiangiogenic, Regeneration, DNA Protection"

Before each "," it's a Table in the database, and the last column accepts multiple values.

The view:

...
    <form asp-action="UploadFile" asp-controller="Import" method="post" enctype="multipart/form-data">
       <div class="for-group">
           <label class="form-label">Upload file</label>
           <input name="file" type="file" class="form-control"/>
           <button type="submit" class="btn-primary btn-primary">Upload</button>
       </div>
    </form>

...

The "Controller"

public void UploadtoDb(IFormFile file)
{
    var stream = file.OpenReadStream();
    using (var reader = new StreamReader(stream))
    using (CsvReader csvReader = new CsvReader(reader, CultureInfo.InvariantCulture))
    {

        var categoryRecords = new List<Category>();
        var ActionsRecords = new List<Actions>();
        var foodRecords = new List<Food>();
        var csv = csvReader.Read().ToString().Split(',');
        csvReader.ReadHeader();
        csvReader.Configuration.BadDataFound.Equals(true);

        while (csvReader.Read())
        {
            var catRecords = new Category
            {
                CategoryName = csvReader.GetField("Categories")
            };
            categoryRecords.Add(catRecords);

            var fRecords = new Food
            {
                FoodName = csvReader.GetField("Foods")
            };

            foodRecords.Add(fRecords);
            foodRecords.Select(x => x.FoodName).Distinct();

            var actionsRecords = new Actions
            {
                ActionName = csvReader.GetField("Actions")
            };
            ActionsRecords.Add(actionsRecords);
            ActionsRecords.Select(x => x.ActionName).Distinct();
        }
        categoryRecords.Select(x => x.CategoryName).Distinct();
        var uniqueCategories = categoryRecords.GroupBy(p => p.CategoryName)
                   .Select(grp => grp.First())
                   .ToArray();
        foreach (var category in uniqueCategories)
        {
            _ctx.Categories.Add(category);
            _ctx.SaveChanges();
        }
        foreach (var food in foodRecords)
        {
            _ctx.Foods.Add(food);
            _ctx.SaveChanges();
        }

        foreach (var action in ActionsRecords)
            _ctx.Actions.Add(action);
        _ctx.SaveChanges();
    }

Wherever I get to the foods, it asks me for the categoryId which is from the Categories table/Third column in the file.

I also tried with some queries and Linq i.e: food.CategoryId = _ctx.Categories.FirstOrDefault(x=>x.Id=food.Category.Id)

Pictures of the errors: enter image description here enter image description here

GiTHubCli code: gh repo clone joaoFerreiragHub/ProjetoFinal

Link: https://github.com/joaoFerreiragHub/ProjetoFinal.git

Is there a way that CsvHelper can help me out with sorting this into the tables?



Solution 1:[1]

If you want to do this quickly, the best way would be to upload the CSV file "as is" into a staging table, say "StagingFoods".

CREATE TABLE StagingFoods
(
    Id int NOT NULL,
    Foods nvarchar(250) NULL,
    Categories nvarchar(250) NULL,
    Actions nvarchar(1000) NULL,
    CONSTRAINT PK_StagingFoods PRIMARY KEY CLUSTERED (Id)
)

Unfortunately, EntityFramework doesn't have a good way to do bulk inserts, but you could try a nuget third party extension like EFCore.BulkExtensions.

var records = csv.GetRecords<StagingFoods>();
_ctx.BulkInsert(records);

Then use straight SQL to create your Categories, Foods and Actions. You can use the SQL in your EntityFramework code using ExecuteSqlRaw. If you have a lot of data, this will be a lot faster than inserting them one at a time.

Category

_ctx.Database.ExecuteSqlRaw(@"
Insert into
    Category(CategoryName)
select
    Categories
from
    StagingFoods sf
        left join Category c
        on sf.Categories = c.CategoryName
where
    c.Id is null
group by
    Categories
");

Food

_ctx.Database.ExecuteSqlRaw(@"
Insert into
    Foods(FoodName,CategoryId)
select
    sf.Foods
   ,c2.Id
from
    StagingFoods sf
        left join Foods f
            join Category c1
            on f.CategoryId = c1.Id and sf.Categories = c1.CategoryName
        on sf.Foods = f.FoodName 
        join Category c2
        on sf.Categories = c2.CategoryName
where
    f.Id is null
group by
    Categories
   ,Foods
");

I'll let you figure out how to do Actions.

When you are finished inserting all of the data in the tables, you can truncate your staging table.

_ctx.Database.ExecuteSqlRaw("TRUNCATE TABLE StagingFoods");

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1