- Blog
- Tip 1 - Cleaning / validating nopCommerce data (customers)
Tip 1 - Cleaning / validating nopCommerce data (customers)
- 4:13:35 AM
- Friday, August 12, 2022
tip
💡 Learn more : Validating new customers online
Remove nopCommerce fake customers
Using nopCommerce day after day means that your database will grow, and some data will be not relevant. It can be many reasons behind the scene for this, e.x.
- one of our customer use product import, and he had to remove products from time to time
- if you are like me, you will get a lot of customers registration because of the bots. Those bots crawl your site buy different reasons and some of them are able to go through reCaptcha.
In this post, I'll walk you through steps of finding data that are not needed and removing them.
Part 1 - Determine how much space on disk each table is consuming, and which tables should be cleaned first
SELECT
t.NAME AS TableName,
s.Name AS SchemaName,
p.rows,
SUM(a.total_pages) * 8 AS TotalSpaceKB,
CAST(ROUND(((SUM(a.total_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS TotalSpaceMB,
SUM(a.used_pages) * 8 AS UsedSpaceKB,
CAST(ROUND(((SUM(a.used_pages) * 8) / 1024.00), 2) AS NUMERIC(36, 2)) AS UsedSpaceMB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 AS UnusedSpaceKB,
CAST(ROUND(((SUM(a.total_pages) - SUM(a.used_pages)) * 8) / 1024.00, 2) AS NUMERIC(36, 2)) AS UnusedSpaceMB
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN
sys.schemas s ON t.schema_id = s.schema_id
WHERE
t.NAME NOT LIKE 'dt%'
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
GROUP BY
t.Name, s.Name, p.Rows
ORDER BY
TotalSpaceMB DESC, t.Name
Results for my DB:
As you can see the biggist tables are Customer, GenericAttribute and QueuedEmail
How do you fix it?
Part 2 - Removing fake customers:
Ideally we should validate all customers emails with 3d party webservice but I have 1Mln recods and most of those customers spent less then one minute.
So let's remove customers that didn't create an order, post and so on. I don't need customers that did zero activity on my site.
DELETE c FROM Customer c
LEFT JOIN ActivityLog al ON al.CustomerId=c.Id
LEFT JOIN [dbo].[ShoppingCartItem] sci ON sci.CustomerId=c.Id
LEFT JOIN [dbo].[Forums_Post] fp ON fp.CustomerId=c.Id
LEFT JOIN Forums_Topic ft ON fp.CustomerId=c.Id
LEFT JOIN [Order] o ON o.CustomerId=c.Id
LEFT JOIN [dbo].[DP_Entity] e1 ON e1.CustomerId=c.Id
LEFT JOIN [dbo].[DP_Entity] e2 ON e2.CustomerIdOfLastChange=c.Id
WHERE DATEDIFF(MINUTE, c.CreatedOnUtc, LastActivityDateUtc)<1 AND c.Id>3
AND al.Id IS NULL
AND sci.Id IS NULL
AND fp.Id IS NULL
AND ft.Id IS NULL
AND o.Id IS NULL
AND e1.Id IS NULL
AND e2.Id IS NULL
Part 3 - Removing fake customers atts:
DELETE ga
FROM [dbo].[GenericAttribute] ga
LEFT JOIN [dbo].[Customer] c on ga.EntityId=c.Id
WHERE KeyGroup='Customer' AND c.Id IS NULL
Part 4 - Removing QueuedEmail and Log:
TRUNCATE TABLE [dbo].[QueuedEmail]
TRUNCATE TABLE [dbo].[Log]
I hope this helps someone out there and if you want to stay in touch then I can be found on Twitter or GitHub .