2021SC@SDUSC
Nova source code analysis
1, What is Nova
Nova is a way for openstack to provide computing instances (also known as virtual server). Nova supports the creation of virtual machines and has limited support for system containers. Although Linux has daemons, Nova still provides services as daemons.
Nova forms the basic service together with the following openstack services:
- Keystone: provides authentication services for all openstack services.
- Grace: provides a calculation instance image library. All calculation instances are created through the grace image.
- Neutron: provides virtual and physical networks for computing instances connected to the root node.
- Placement: record the available resource inventory of the cloud and select how to use the resources when creating a virtual machine.
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG mxputnaf-1633804453865)( https://docs.openstack.org/nova/xena/_images/architecture.svg )]
- DB: sql database for data storage.
- API: component that receives HTTP requests, converts commands and communicates with other components via the oslo.messaging queue or HTTP.
- Scheduler: decides which host gets each instance.
- Compute: manages communication with hypervisor and virtual machines.
- Conductor: handles requests that need coordination (build/resize), acts as a database proxy, or handles object conversions.
- Placement: tracks resource provider inventories and usages.
For end users, you can directly create and manage servers using API s through Horizon, Openstack Client or Nova Client.
Nova can be set to issue notifications via RPC
For developers, oepnstack provides a wealth of guide s and reference s to learn from.
When a user initiates a new request, the request is processed in Nova API first. Nova API will perform a series of checks on the request, including whether the request is legal and whether the quota is sufficient; After checking, Nova API will assign a unique virtual machine ID to the request, and create a corresponding entry in the database to record the status of the virtual machine; The Nova API then sends the request to the Nova conductor for processing.
Nova conductor mainly manages the communication between services and performs task processing. After receiving the request, it creates a RequestSpec object for nova-scheduler to wrap and schedule all relevant request information, and then invokes the select_ of the nova-scheduler service. Destination interface.
Through the received RequestSpec object, Nova scheduler first converts the RequestSpec object into a ResourceRequest object and sends the object to Placement for pre filtering. Then it will make a scheduling decision according to the latest system state in the database and tell Nova conductor to schedule the request to the appropriate computing node.
After the Nova conductor knows the scheduling decision, it will send the request to the corresponding Nova compute service.
Each Nova compute service has an independent Resource Tracker to monitor the resource usage of the local host. When the computing node receives the request, the resource monitor can check whether the host has enough resources.
-
If the resources are sufficient, start the specified virtual machine, update the virtual machine status in the database, and update the latest host resources to the database.
-
If the current host does not meet the requested resource requirements, Nova compute rejects the request and resends the request to noca conduct to retry the whole scheduling process.
2, Components
folders:
1.api: accept and call services
-
cmd: various service portals of Nova
-
compute: create and terminate the daemon of the virtual machine, and manage the communication between the hypervisor and the virtual machine.
-
conf: configuration options
-
conduct: handles requests requiring coordination. It is the intermediary of api, scheduler and compute.
-
Console: console service
-
db: encapsulating database services
-
hacking: code specification check
-
image: encapsulate mirror operation
-
keymgr: Key Manager
-
locale: internationalization related
-
Network: network service
-
notification: notification
-
object: avoid directly operating the database and encapsulate the operation
-
pci: PCI/SR-IOV support
-
scheduler: scheduler service
-
servicegroup: member service, service group
-
Storage: CEPH storage
-
Tests: unit tests
-
virt: supported hypervisor drivers
-
Volume: encapsulates the volume service, and the Cinder interface is abstract
-
Novncproxy: coordinates the interaction between compute service and database. It is the generation of interaction between compute and database
files:
__init__.py availability_zones.py # Locale tool functions baserpc.py # Basic RPC client / server implementation block_device.py # Block device mapping cache_utils.py # oslo_cache encapsulation config.py # Parsing command line parameters context.py # Context of all requests running through Nova crypto.py # Packaging standard encrypted data elements debugger.py # pydev debugging exception.py # Basic exception class exception_wrapper.py # Encapsulating exception classes filters.py # Basic filter i18n.py # Integrated oslo_i18n loadables.py # Loadable class manager.py # Basic Manager class middleware.py # Update Oslo_ Default configuration options for Middleware monkey_patch.py # eventlet monkey patch policy.py # Policy engine profiler.py # Call OSProfiler quota.py # Resource quota per project rpc.py # Tool functions related to RPC operations safe_utils.py # Tool functions that do not cause circular import service.py # Common node base class for all workers running on the host service_auth.py # Identity authentication plug-in test.py # Unit test basic class utils.py # Tool function version.py # Version number management weights.py # Weight plug-in wsgi.py # Manage server classes for WSGI applications
3, Introduction to related service structure
-
conduct
api.py encapsulates RPC interface
rpcapi.py provides an RPC interface
manager.py handles RPC API calls
compute accesses the database through the conduct agent. Conduct operates on objects. One object corresponds to one table
-
scheduler
Filter provides filter implementation to filter unqualified hosts
Weights provides a weight implementation for calculating weights and sorting
4, Nova create virtual machine trial analysis
First, call the Create method of service.py in compute, and call compute through it_ API__ create_instance method.
def create(*self*, *context*, *instance_type*, *image_href*, *kernel_id*=None, *ramdisk_id*=None, *min_count*=None, *max_count*=None, *display_name*=None, *display_description*=None, *key_name*=None, *key_data*=None, *security_groups*=None, *availability_zone*=None, *forced_host*=None, *forced_node*=None, *user_data*=None, *metadata*=None, *injected_files*=None, *admin_password*=None, *block_device_mapping*=None, *access_ip_v4*=None, *access_ip_v6*=None, *requested_networks*=None, *config_drive*=None, *auto_disk_config*=None, scheduler_hints*=None, *legacy_bdm*=True, *shutdown_terminate*=False, *check_server_group_quota*=False, *tags*=None, *supports_multiattach*=False, *trusted_certs*=None, *supports_port_resource_request*=False, *requested_host*=None, *requested_hypervisor_hostname*=None): """Provision instances, sending instance information to the scheduler. The scheduler will determine where the instance(s) go and will handle creating the DB entries. Returns a tuple of (instances, reservation_id) """ *if* requested_networks and max_count is not None and max_count > 1: self._check_multiple_instances_with_specified_ip( requested_networks) self._check_multiple_instances_with_neutron_ports( requested_networks) *if* availability_zone: available_zones = availability_zones.\ get_availability_zones(context.elevated(), self.host_api, *get_only_available*=True) *if* forced_host is None and availability_zone not in \ available_zones: msg = _('The requested availability zone is not available') *raise* exception.InvalidRequest(msg) filter_properties = scheduler_utils.build_filter_properties( scheduler_hints, forced_host, forced_node, instance_type) *return* self._create_instance( context, instance_type, image_href, kernel_id, ramdisk_id, min_count, max_count, display_name, display_description, key_name, key_data, security_groups, availability_zone, user_data, metadata, injected_files, admin_password, access_ip_v4, access_ip_v6, requested_networks, config_drive, block_device_mapping, auto_disk_config, *filter_properties*=filter_properties, *legacy_bdm*=legacy_bdm, *shutdown_terminate*=shutdown_terminate, *check_server_group_quota*=check_server_group_quota, *tags*=tags, *supports_multiattach*=supports_multiattach, *trusted_certs*=trusted_certs, *supports_port_resource_request*=supports_port_resource_request, *requested_host*=requested_host, *requested_hypervisor_hostname*=requested_hypervisor_hostname)
_ create_instance method called compute_ task_ Schedule of api_ and_ build_ Instances method, that is, schedule in the api of conduct_ and_ build_ Instances method, which directly calls schedule in rpcapi of compute of conduct_ and_ build_ Instances method.
def schedule_and_build_instances(self, context, build_requests, request_specs, image, admin_password, injected_files, requested_networks, block_device_mapping, tags=None): version = '1.17' kw = {'build_requests': build_requests, 'request_specs': request_specs, 'image': jsonutils.to_primitive(image), 'admin_password': admin_password, 'injected_files': injected_files, 'requested_networks': requested_networks, 'block_device_mapping': block_device_mapping, 'tags': tags} if not self.client.can_send_version(version): version = '1.16' del kw['tags'] cctxt = self.client.prepare(version=version) cctxt.cast(context, 'schedule_and_build_instances', **kw)
cast is the RPC call schedule_ and_ build_ The instances method is an asynchronous call and will be returned immediately
Up to now, although the directory is controlled by API - > compute - > conductor, it still runs in the Nova API process until the cast method is executed. Because the method is called asynchronously, it will return immediately and will not wait for the RPC to return. Therefore, when the Nova API task is completed, it will respond to the user's request and the virtual machine status is building.
After that, the request is passed to the manager.py of conduct through oslo message, and schedule is called_ and_ build_ Instances method, which first calls_ schedule_ Select of instance_ Destinations method
def schedule_and_build_instances(self, context, build_requests, request_specs, image, admin_password, injected_files, requested_networks, block_device_mapping, tags=None): # Add all the UUIDs for the instances instance_uuids = [spec.instance_uuid for spec in request_specs] try: host_lists = self._schedule_instances(context, request_specs[0], instance_uuids, return_alternates=True) except Exception as exc: LOG.exception('Failed to schedule instances') self._bury_in_cell0(context, request_specs[0], exc, build_requests=build_requests, block_device_mapping=block_device_mapping, tags=tags) return
scheduler_client and compute_api and compute_ task_ The API is the same. It encapsulates and calls the client of the service. However, the scheduler does not have an api.py module, but has a separate client directory, which implements the query.py module in the nova/scheduler/client directory, select_ The destinations method directly calls the scheduler_ Select of rpcapi_ The destinations method finally comes to the RPC call phase.
RPC encapsulation is implemented in rpcapi.py of scheduler.
cctxt = self.client.prepare( version=version, call_monitor_timeout=CONF.rpc_response_timeout, timeout=CONF.long_rpc_timeout) return cctxt.call(ctxt, 'select_destinations', **msg_args)
call method is the synchronous method of RPC. conduct will wait for the scheduler to return. At this time, the scheduler takes over the task.
rpcapi calls the corresponding select of manager.py_ Destination method, which in turn calls the driver's select_destination method. The driver here is actually a scheduling driver. It is specified in the scheduler configuration group in the configuration file. It is filter by default_ Scheduler, corresponding to Nova / scheduler / filter_ In the scheduler.py module, the algorithm filters out the calculation nodes that do not meet the conditions according to the specified filters, then calculates the weight through the weight method, and finally selects the one with high weight as the candidate calculation node.
Finally, Nova scheduler returns the scheduled hosts collection, and the task ends. Since Nova conductor calls this method through the synchronization method, Nova scheduler will return the result to the Nova conductor service.
conduct returns to the scheduler of manager.py after waiting for the scheduler to return_ and_ build_ Instance method.
After that, call compute_. Build of rpcapi_ and_ run_ instance
with obj_target_cell(instance, cell) as cctxt: self.compute_rpcapi.build_and_run_instance( cctxt, instance=instance, image=image, request_spec=request_spec, filter_properties=filter_props, admin_password=admin_password, injected_files=injected_files, requested_networks=requested_networks, security_groups=legacy_secgroups, block_device_mapping=instance_bdms, host=host.service_host, node=host.nodename, limits=host.limits, host_list=host_list, accel_uuids=accel_uuids)
Similarly, rpcapi asynchronously calls the method of compute with the same name, and compute takes over the task.
Go to manager.py of compute and find build_and_run_instance method.
def build_and_run_instance(self, context, instance, image, request_spec, filter_properties, accel_uuids, admin_password=None, injected_files=None, requested_networks=None, security_groups=None, block_device_mapping=None, node=None, limits=None, host_list=None): @utils.synchronized(instance.uuid) def _locked_do_build_and_run_instance(*args, **kwargs): # NOTE(danms): We grab the semaphore with the instance uuid # locked because we could wait in line to build this instance # for a while and we want to make sure that nothing else tries # to do anything with this instance while we wait. with self._build_semaphore: try: result = self._do_build_and_run_instance(*args, **kwargs) except Exception: # NOTE(mriedem): This should really only happen if # _decode_files in _do_build_and_run_instance fails, and # that's before a guest is spawned so it's OK to remove # allocations for the instance for this node from Placement # below as there is no guest consuming resources anyway. # The _decode_files case could be handled more specifically # but that's left for another day. result = build_results.FAILED raise finally: if result == build_results.FAILED: # Remove the allocation records from Placement for the # instance if the build failed. The instance.host is # likely set to None in _do_build_and_run_instance # which means if the user deletes the instance, it # will be deleted in the API, not the compute service. # Setting the instance.host to None in # _do_build_and_run_instance means that the # ResourceTracker will no longer consider this instance # to be claiming resources against it, so we want to # reflect that same thing in Placement. No need to # call this for a reschedule, as the allocations will # have already been removed in # self._do_build_and_run_instance(). self.reportclient.delete_allocation_for_instance( context, instance.uuid) if result in (build_results.FAILED, build_results.RESCHEDULED): self._build_failed(node) else: self._build_succeeded(node) # NOTE(danms): We spawn here to return the RPC worker thread back to # the pool. Since what follows could take a really long time, we don't # want to tie up RPC workers. utils.spawn_n(_locked_do_build_and_run_instance, context, instance, image, request_spec, filter_properties, admin_password, injected_files, requested_networks, security_groups, block_device_mapping, node, limits, host_list, accel_uuids)
The driver here is the compute driver, which configures the compute of the group through compute_ The driver is specified as libvirt.LibvirtDriver. The code is located in nova/virt/libvirt/driver.py. Find the spawn() method, which calls Libvirt to create a virtual machine, and wait until the virtual machine state is active, the Nova compute service ends, and the whole process of creating a virtual machine ends.
5, Summary
n_password, injected_files,
requested_networks, security_groups,
block_device_mapping, node, limits, host_list,
accel_uuids)
The driver here is the compute driver, which configures the compute of the group through compute_ The driver specifies libvirt.LibvirtDriver here. The code is located in nova/virt/libvirt/driver.py. Find the spawn() method, which calls Libvirt to create a virtual machine and waits for the virtual machine state to be active. The Nova compute service ends, and the whole process of creating a virtual machine ends here.
5, Summary
Nova's architecture and working mode have more content to mine. The law is that different levels communicate through RPC and call the implementation methods in manager, but the specific strategies need to be further explored.