'oracle sql - delete rows with null value for every group

I try to delete rows with null values at the mail column, if and only if exists a row with a non-null value for the same person_id.

DELETE
FROM
  EMP_MAIL EM
WHERE
  MAIL IS NULL
AND EXISTS
  (
    SELECT 
      EMP_ID
    FROM
      EMP_MAIL EM2
    WHERE
      MAIL IS NOT NULL
    AND EM.EMP_ID    = EM2.EMP_ID
  ) ;

but it takes quite a while. (i have an non-unique index on EMP_MAIL(EMP_ID))

is there a better way to preform this operation ?



Solution 1:[1]

it depends on your input data, i hope this query can help you or give you some clues to solve the problem:

DELETE
from 
EMP_MAIL t1 
where 
t1.MAIL is null and 
not Exists
(select t2.EMP_ID from (select EMP_ID from EMP_MAIL where mail is not null
minus
select EMP_ID from EMP_MAIL where mail is null)t2 where t1.EMP_ID =t2.EMP_ID);

Solution 2:[2]

Well, I've tested a bit and now I'm going to present the result.

This is my test script (have a look: no indices at all):

create table emp_mail (
  emp_id        number,
  mail          clob
);

declare
  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 90;
begin
  for i in 1..300 loop
    insert into emp_mail
    select round(dbms_random.value*amount_of_users),case when dbms_random.value < nulls_amount_percent/100 then null else 'some mail' end
    from dual
    connect by level <= 10000;

    commit;
  end loop;
end;
/
delete from emp_mail em
where mail is null
  and exists (select null
              from emp_mail em2
              where mail is not null
                and em.emp_id = em2.emp_id);

drop table emp_mail;

It creates the table (I create with CLOB to have the worst case), then it fills the table with 3M of rows, by the constants

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 90;

you can play around the rows which are going to be removed, then it performs your "delete" statement and drops the table.

Let's talk about numbers.

Warning: when I talk about seconds I mean only delete statement, because row generator anonymous block takes nearly 2 minutes.

With these values

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 90;

nearly the whole table is deleted and I can reproduce your mystic 27.5 seconds.

Then I take

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 1;

to decrease the amount of removed rows and it gives me only 1.8 seconds.

Let's take a case when no rows are deleted:

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 0;

Guess time? 1.1 second, and as far as you told it is your case.

Well, I can't give you any advice what to do because the thing you are talking is probably a bit more global then one query. Perhaps, you database version is too old, or it is about hardware or anything else.

If it is about this case

  amount_of_users        constant number := 30000;
  nulls_amount_percent   constant number := 90;

when really many rows are removed I can advice to split the deleting into e.g. 10000-rows-parts (and firstly select count(*) to know how many times you should run the restricted deleting) and make a commit after every, sometimes it is much faster then to delete big amounts of rows. But as far as you told that you have 0 rows deleted it shouldn't be your problem.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Hamidreza
Solution 2 smnbbrv