'How does generating an interrupt (calling irqfd) calls an interrupt on the KVM VM?
The KVM irqfd ioctl starts the irqfd for a file descriptor. It does this:
case KVM_IRQFD: {
struct kvm_irqfd data;
r = -EFAULT;
if (copy_from_user(&data, argp, sizeof(data)))
goto out;
r = kvm_irqfd(kvm, &data);
break;
}
where kvm_irqfd is here
and calls kvm_irqfd_assign which initiates a wakeup queue:
init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
That is, irqfd_wakeup does this:
if (flags & EPOLLIN) {
u64 cnt;
eventfd_ctx_do_read(irqfd->eventfd, &cnt);
idx = srcu_read_lock(&kvm->irq_srcu);
do {
seq = read_seqcount_begin(&irqfd->irq_entry_sc);
irq = irqfd->irq_entry;
} while (read_seqcount_retry(&irqfd->irq_entry_sc, seq));
/* An event has been signaled, inject an interrupt */
if (kvm_arch_set_irq_inatomic(&irq, kvm,
KVM_USERSPACE_IRQ_SOURCE_ID, 1,
false) == -EWOULDBLOCK)
schedule_work(&irqfd->inject);
srcu_read_unlock(&kvm->irq_srcu, idx);
ret = 1;
}
As you can see in schedule_work(&irqfd->inject), it schedules the inject function, which is here:
static void
irqfd_inject(struct work_struct *work)
{
struct kvm_kernel_irqfd *irqfd =
container_of(work, struct kvm_kernel_irqfd, inject);
struct kvm *kvm = irqfd->kvm;
if (!irqfd->resampler) {
kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1,
false);
kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0,
false);
} else
kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
irqfd->gsi, 1, false);
}
It calls kvm_set_irq defined here which does this:
int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level,
bool line_status)
{
struct kvm_kernel_irq_routing_entry irq_set[KVM_NR_IRQCHIPS];
int ret = -1, i, idx;
trace_kvm_set_irq(irq, level, irq_source_id);
/* Not possible to detect if the guest uses the PIC or the
* IOAPIC. So set the bit in both. The guest will ignore
* writes to the unused one.
*/
idx = srcu_read_lock(&kvm->irq_srcu);
i = kvm_irq_map_gsi(kvm, irq_set, irq);
srcu_read_unlock(&kvm->irq_srcu, idx);
while (i--) {
int r;
r = irq_set[i].set(&irq_set[i], kvm, irq_source_id, level,
line_status);
if (r < 0)
continue;
ret = r + ((ret < 0) ? 0 : ret);
}
return ret;
}
It looks like it finally calls something at:
r = irq_set[i].set(&irq_set[i], kvm, irq_source_id, level,
line_status);
This set function is filled by this.
It sets to this function:
static int vgic_irqfd_set_irq(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id,
int level, bool line_status)
{
unsigned int spi_id = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
if (!vgic_valid_spi(kvm, spi_id))
return -EINVAL;
return kvm_vgic_inject_irq(kvm, 0, spi_id, level, NULL);
}
which calls kvm_vgic_inject_irq which finally calls vgic_put_irq which calls this:
void __vgic_put_lpi_locked(struct kvm *kvm, struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
if (!kref_put(&irq->refcount, vgic_irq_release))
return;
list_del(&irq->lpi_list);
dist->lpi_list_count--;
kfree(irq);
}
but I don't see how the GIC is called here, I only see the list being deleted.
I thought here it would send the interrupt to the GIC, which would then call the VM somehow. I'm trying to understand how calling the irqfd file descriptor ends up calling an interrupt in the VM.
Solution 1:[1]
VGIC is for arm, you should check arm support file. While x86 is using either APIC or PIC. mostly APIC now. You can check the specification of how those IRQ chip works to transfer the external signal to the destination core(vcpu).
For example, if you were using a x86 virtual machine(I have no idea of VGIC) which is using IOAPIC, there are 24 pins for example(emulated), and you should understand the APIC(hardware), then you know how it works.
https://elixir.bootlin.com/linux/v5.2.12/source/arch/x86/kvm/irq_comm.c#L271
https://elixir.bootlin.com/linux/v5.2.12/source/arch/x86/kvm/irq_comm.c#L38
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tong Xing |
