Addressing the formidable challenges posed by multiple jammers jamming multiple radars, which arise from spatial discretization, many degrees of freedom, numerous model input parameters, and the complexity of constraints, along with a multi-peaked objective function, this paper proposes a cooperative jamming resource allocation method, based on evolutionary reinforcement learning, that uses joint multi-domain information. Firstly, an adversarial scenario model is established, characterizing the interaction between multiple jammers and radars based on a multi-beam jammer model and a radar detection model. Subsequently, considering real-world scenarios, this paper analyzes the constraints and objective function involved in cooperative jamming resource allocation by multiple jammers. Finally, accounting for the impact of spatial, frequency, and energy domain information on jamming resource allocation, matrices representing spatial condition constraints, jamming beam allocation, and jamming power allocation are formulated to characterize the cooperative jamming resource allocation problem. Based on this foundation, the joint allocation of the jamming beam and jamming power is optimized under the constraints of jamming resources. Through simulation experiments, it was determined that, compared to the dung beetle optimizer (DBO) algorithm and the particle swarm optimization (PSO) algorithm, the proposed evolutionary reinforcement learning algorithm based on DBO and Q-Learning (DBO-QL) offers 3.03% and 6.25% improvements in terms of jamming benefit and 26.33% and 50.26% improvements in terms of optimization success rate, respectively. In terms of algorithm response time, the proposed hybrid DBO-QL algorithm has a response time of 0.11 s, which is 97.35% and 96.57% lower than the response times of the DBO and PSO algorithms, respectively. The results show that the method proposed in this paper has good convergence, stability, and timeliness.
Keywords: electronic countermeasures; jamming resource allocation; cooperative jamming; reinforcement learning; dung beetle optimizer algorithm
With the advancements in electronic warfare and technologies such as artificial intelligence (AI), radar and jamming systems endowed with cognitive capabilities have made significant progress [[
As the complexity of real battlefield environments continues to increase, the efficient allocation of jamming resources, enabling jammers to maximize their jamming effectiveness with limited resources, has become a central challenge in the field of electronic warfare and related domains. Jamming resource allocation represents one manifestation of resource allocation in the domain of electronic warfare, and various resource allocation methods across different domains offer opportunities for mutual cross-fertilization. In recent years, research on communication resource allocation in the context of cognitive communication has yielded rich results, encompassing aspects such as time [[
Numerous scholars have conducted research on the problem of cooperative jamming resource allocation for multiple jammers. Zou [[
In recent years, reinforcement learning (RL) technology has been developed and deepened, with its ability to learn strategies that maximize rewards through the interaction of the agent with the environment [[
The development of evolutionary reinforcement learning (ERL) [[
Based on the above research findings, this paper further investigates the problem of cooperative jamming resource allocation based on multi-domain information. Combined with the idea of evolutionary reinforcement learning, this paper proposes a two-layer model for cooperative jamming resource allocation that integrally considers information in three dimensions: spatial, frequency, and energy domains. The jamming beam allocation matrix and jamming power allocation matrix were optimized using the outer DBO algorithm and the inner Q-learning algorithm, respectively. This method transforms the complex two-dimensional decision problem of the joint optimization of jamming beam allocation and jamming power allocation into two one-dimensional decision problems. The smaller dimensions of the solution space effectively prevent the algorithm from becoming trapped in a locally optimal solution and, at the same time, reduce the response time of the algorithm and ensure its stability.
In summary, the main contributions of this paper can be outlined as follows:
- A model containing a spatial condition constraint matrix, a jamming beam allocation matrix, and a jamming power allocation matrix was constructed to address the cooperative-jamming resource allocation problem. The model integrates information from the spatial, frequency, and energy domains to formulate constraints and an objective function, making its results more aligned with the complex real-world environment.
- In order to better solve the constructed model, an evolutionary reinforcement learning method called the hybrid DBO-QL algorithm was proposed. This method adopts a hierarchical selection and joint optimization strategy for jamming beam and power allocation, greatly reducing the response time of the algorithm while providing good convergence and stability.
The rest of this paper is organized as follows. Section 2 introduces the adversarial scenario model and formulates the constraints and objective function for cooperative jamming resource allocation among multiple jammers. Section 3 provides a detailed explanation of the DBO algorithm, Q-learning algorithm, and the proposed hybrid DBO-QL algorithm. Section 4 presents the simulation experiments and results' analysis for the three algorithms. Finally, the conclusions drawn from this study are presented in Section 5.
Section 2 describes the modelling of an adversarial scenario for cooperative jamming resource allocation and the design of the constraints and an objective function based on the model.
This paper considers a "many-to-many" adversarial scenario model, as illustrated in Figure 1. In the spatial adversarial scenario, multiple multi-beam jamming aircraft (jammers) collaborate to perform coordinated jamming tasks against multiple ground radars. The multi-beam jamming systems of each aircraft can simultaneously generate multiple jamming beams to jam multiple radars in different directions. The resources, such as the pointing direction, quantity, and transmission power of the jamming beams, can be flexibly controlled. The radar side detects targets based on the received signal-to-jamming ratio (SJR). Due to the limited jamming power that each jamming aircraft can provide, it is necessary to reasonably allocate the attitude of each jamming aircraft and the transmission power of different beams based on real-time battlefield situation information to improve the utilization of jamming resources. This allocation method is designed to achieve the highest possible jamming benefit using limited jamming resources.
From a mathematical perspective, the problem of cooperative jamming resource allocation using multiple jammers can be formulated as a multi-choice problem under multi-dimensional constraints. The objective is to minimize the detection performance of enemy radars under the constraint of limited jamming resources. In light of this, the present study integrates information from the spatial, frequency, and energy domains to consider and formulate the constraints and objective function for the cooperative-jamming resource allocation problem.
Assume there are M jammers and N radars in the adversarial scenario. To maximize jamming benefit, it is necessary to optimize the two jamming resources of the cooperative jamming system, namely, the jamming beam directions from jammers to radars and the transmission power of different jamming beams. For this purpose, a binary variable matrix K was defined to characterize the allocation relationship of jamming beam directions from jammers to radars, as shown in Equation (
(
Here,
In terms of the allocation of transmission power for different jamming beams, a jamming power allocation matrix P was defined to quantify the distribution of power resources in the cooperative jamming system. The corresponding formulation is provided in Equation (
(
Here,
Therefore, the objective of the cooperative-jamming resource allocation problem for multiple jammers is to solve for the optimal jamming beam allocation matrix K and jamming power allocation matrix P under multiple constraints. In this regard, the aim is to achieve optimal jamming performance in situations where system-jamming resources are limited. Furthermore, considering the constraints in cooperative-jamming resource allocation for multiple jammers, this paper designed an objective function that accurately quantifies the jamming benefit obtained from cooperative-jamming resource allocation. By solving the optimization problem, the best configuration for the system's jamming resource variables is sought to maximize the jamming benefit characterized by the objective function.
Setting the constraints for jamming resource allocation adequately and reasonably is beneficial for quickly finding the optimal jamming resource allocation strategy based on the actual battlefield situation, thereby enhancing the efficiency of jammer utilization. In this regard, this paper considers constraints for the system model based on the following five aspects:
- Spatial condition constraint. When allocating jamming resources, considering the practical situation, not all jammers can be assigned to jam a specific target radar. An essential prerequisite for such an assignment is that the jammer must be within the beam coverage area of that radar. Therefore, this paper defined a binary variable matrix Q to characterize the spatial relationship between jammers and radars, as shown in Equation (
3 ).
(
where
- 2. Jamming beam allocation quantity constraint. Constrained by the payload capacity of the jammer itself, it is assumed that each jammer can simultaneously allocate a maximum of l jamming beams to jam multiple radars, as follows:
(
- 3. Jamming resource utilization constraint. A single jamming beam emitted by a multi-beam jammer can effectively jam one radar. In practice, the number of radars to be jammed may be several times the number of jammers. To enhance the efficiency of jamming resources and avoid wastage, it is stipulated that each radar can be assigned a maximum of one jamming beam, as follows:
(
- 4. Jamming power allocation constraint. Each targeted radar must be allocated jamming power. Assume that the SJR at the radar end is greater than 14 dB indicates that the jamming provided by the jammers is completely ineffective for the radar in question. An SJR less than 3 dB indicates that the radar has been effectively jammed, and further jamming power allocation beyond this range would result in a waste of jamming resources, as follows:
(
- 5. Total jamming power constraint. Constrained by the payload capacity of the jammer itself, the sum of the jamming power allocated to all jamming beams of a multi-beam jammer should not exceed the maximum jamming power it can provide, as follows:
(
The purpose of multi-jammer cooperative-jamming resource allocation is to minimize the detection probability of a radar system by optimizing the allocation of the jamming beam and power resources of the multi-jammer cooperative-jamming system while satisfying the working frequency matching between the jammers and the radars. Therefore, the objective function designed in this paper consists of the following four evaluation factors:
- Spectral alignment.
The spectral alignment factor for jammer m with radar n, denoted as
Assuming the jamming frequency range for jammer m is [
From Figure 2, it can be observed that when there are three scenarios (a), (b), and (c) depicting the spectral overlap between the jammer's jamming frequency and the radar's operating frequency, the overlapping region in the figure can be represented using Equation (
(
When the jammer's jamming frequency coincides with the radar's operating frequency in scenarios (d) and (e), it can be observed that there is no overlapping region in the figure. In this case, the spectral overlap degree should be 0. However, if Equation (
(
Therefore, a formula satisfying all the situations depicted in Figure 2 for the spectral overlap degree
(
where
(
where
- 2. Signal-to-jamming ratio at the radar receiver.
The detection probability on the radar side is closely associated with the SJR at its receiving end. The objective of cooperative-jamming resource allocation among multiple jammers is to appropriately determine the jamming power for jammers, thereby reducing the SJR at the radar end. Consequently, the SJR at the radar end serves as an evaluative metric for the jamming effectiveness at a given jamming power.
In the system model established in this paper, the jammers jam the radars to conceal themselves from detection by said radars. Therefore, the distance between the radar and the target is assumed to be equal to the distance between the radar and the jammer. Consequently, based on the radar equation, the SJR obtained by the radar can be expressed as follows:
(
where
Similarly, the SJR obtained by the jammer can be derived as follows:
(
where
The ratio between the SJR obtained by the radar and that obtained by the jammer can be obtained using the following formula:
(
Hence, the jammer can estimate the SJR at the radar receiver based on the SJR obtained by its own platforms. To assess the jamming effect of jammer m transmitting a jamming signal with power
(
where
A radar receiver SJR benefit-factor, denoted as
(
where
- 3. Distance between jammers and radars.
According to Equation (
Assuming the position of jammer m is denoted as
(
An overall distance benefit factor
(
where
- 4. Number of jammed radars.
In practical scenarios, the number of radars that can be jammed may be constrained by factors such as the quantity of jammers, payload limitations, and constraints in the spatial, frequency, and energy domains. Consequently, not all detected radars can be jammed by the jammers. The objective of cooperative-jamming resource allocation involving multiple jammers is to maximize the jamming coverage over the detected radars. To assess the jamming effectiveness of the current resource allocation scheme, a benefit factor for the number of jammed radars was defined as follows:
(
where
Employing a linear weighting method to balance the weights of the four evaluation factors mentioned above, these factors can be integrated into the following unified objective function:
(
where
J represents the jamming benefit, and a larger value indicates more effective jamming of the entire cooperative jamming system with respect to the radars. The magnitude of J can be used to gauge the rationality of jamming resource allocation, thereby enhancing the utilization efficiency of jamming resources.
An evolutionary reinforcement learning method called the hybrid DBO-QL algorithm was developed for the cooperative-jamming resource allocation model involving the complex constraints constructed in Section 2. The principle of the proposed algorithm is elaborated in detail below.
The DBO algorithm, introduced by Xue et al. [[
In the DBO algorithm, each dung beetle's position corresponds to a solution. There are five behaviors exhibited by the dung beetles when foraging in this algorithm: rolling a ball, utilizing celestial cues such as the sun for navigation, thus allowing the ball to be rolled in a straight line; dancing, which allows a dung beetle to reposition itself; reproduction, a natural behavior in which dung beetles roll fecal balls to a secure location, where they hide and use the balls as a breeding ground; foraging, in which some adult dung beetles emerge from the ground to search for food; and stealing, in which certain dung beetles, known as thieves, steal fecal balls from other beetles.
Therefore, the dung beetle population in the algorithm is divided into four categories, namely, the ball-rolling dung beetle, the brood ball, the small dung beetle, and the thief, as shown in Figure 3. The population is divided into the different roles in a ratio of 6:6:7:11. In other words, according to the population size in Figure 3, out of 30 individuals, six dung beetles are assigned to engage in ball-rolling behavior. These ball-rolling dung beetles adjust their running direction based on various natural environmental influences, initially searching for a safe location for foraging. Another six dung beetles are designated as beetles engaging in reproductive behavior, and the reproduction balls will be placed in a known safe area. Seven dung beetles are defined as small dung beetles, which forage in the optimal foraging area. The remaining eleven dung beetles are classified as thieves, and thief dung beetles search for food based on the positions of other dung beetles and the optimal foraging area.
The conditions for ball rolling carried out by dung beetles can be categorized into two scenarios: obstacle-free conditions and conditions involving obstacles.
- The scenario without obstacles
When there are no obstacles along the path of the dung beetle's progression, the ball-rolling dung beetle employs the sun as a navigation reference to ensure its dung-ball rolls along a straight trajectory. Natural factors can affect the beetle's path during ball rolling. The position-updating mechanism for the dung beetle during the ball-rolling process is represented by Equation (
(
where t denotes the current iteration number;
- 2. The scenario with obstacles
When dung beetles encounter obstacles that hinder their progression, they need to adjust their direction through a dancing mechanism. The formula defining the position update during a ball-rolling dung beetle's dance is as follows:
(
where
In dung beetle reproduction, a boundary selection strategy is used to simulate the oviposition area for female dung beetles. The definition of the oviposition area is expressed in the following Equation (
(
where
The DBO algorithm defines the lower bound
(
where
Guide dung beetle larvae to search for food and simulate their foraging behavior by establishing an optimal foraging area, which is defined in Equation (
(
Here, R remains consistent with the previous definition and
The DBO algorithm defines the lower bound
(
where
Within the dung beetle population, there will be some that steal dung balls from other dung beetles. The position update process for the thieving dung beetles is expressed in Equation (
(
where g represents a random vector of size
A flowchart of the DBO algorithm is shown in Figure 4, which primarily consists of the following six steps:
- Initialize the dung beetle populations and set the parameters of the DBO algorithm;
- Calculate the fitness values for all dung beetle positions based on the objective function;
- Update the positions of all dung beetle populations according to the set rule;
- Check whether each updated dung beetle has exceeded the boundaries;
- Update the current optimal solution and its fitness value;
- Repeat the above steps, and after the iteration count t reaches the maximum iteration count, output the global optimal value and its corresponding solution.
As a value-function-based algorithm, Q-learning is a typical temporal difference (TD) algorithm [[
The Q-learning algorithm updates the value function based on the immediate reward obtained from the next state and the estimated value of the value function. Thus, at time
(
Hence, the update function can be expressed as follows:
(
where
When the agent selects an action, to avoid falling into local optima, the choice of strategy
(
The optimization process of the Q-learning algorithm is an exploration–exploitation process. After reaching the maximum iteration count, a convergent state–action two-dimensional table, known as the Q-table, is obtained.
Since the "many-to-many" adversarial scenario model proposed in Section 2 has numerous input parameters, complex constraints, and a multi-peaked objective function, commonly used metaheuristic swarm intelligence algorithms for solving optimization problems often exhibit slow convergence and unsatisfactory convergence results. On the other hand, it is difficult to apply reinforcement learning algorithms with good convergence results and high efficiency in situations with multiple inputs and complex constraints. To address this, this paper proposes a combination of the DBO algorithm, which is an evolutionary algorithm, and the classical Q-Learning algorithm from reinforcement learning. Through this evolutionary reinforcement learning method, this paper aims to tackle the jamming resource allocation problem in a scenario where multiple jammers are jamming multiple radars.
Assuming the adversarial process between the jammers and radars involves a total of U adversarial rounds at a certain time, the process of cooperative jamming resource allocation with joint multi-domain information proceeds as follows.
In the uth adversarial round, the friendly side acquires information about jammers and radars through situational awareness and electromagnetic spectrum sensing in the external environment. Using radar–signal–deinterleaving technology, it determines the quantity, positions, and parameters of the radar radiation sources. Subsequently, based on the received information about the parameters of jammers and radar radiation sources, and with specific constraints in mind, the outer layer of the DBO algorithm is employed to assess which radars can be jammed by the jamming beams of the same jammer. This process generates the jamming beam allocation matrix K. Furthermore, the inner layer of the Q-learning algorithm, utilizing the jamming beam allocation matrix K as a foundation, allocates the transmission power for each jamming beam. This generates the jamming power allocation matrix P, completing the cooperative jamming resource allocation for multiple jammers in the adversarial round. The process then proceeds to the (u + 1)th adversarial round. The method for cooperative jamming resource allocation with joint multi-domain information is illustrated in Figure 5.
The objective of the outer-layer DBO algorithm is to find the optimal jamming beam allocation matrix K while satisfying constraints (
(
where
The inner-layer Q-learning algorithm's objective is to make decisions about the optimal jamming power allocation matrix P under the guidance of the optimal jamming beam allocation matrix K. This process is aimed at achieving cooperative jamming resource allocation with multiple jammers. Therefore, the design approach for the inner-layer Q-learning algorithm is as follows:
(
(
(
(
Thus, the jamming effect of the jamming power allocation matrix P, determined by the decision of the inner-layer Q-learning algorithm, can be assessed using the radar receiver SJR benefit factor
(
Therefore, for the proposed hybrid DBO-QL algorithm in this paper, the jamming benefit, as defined by Equation (
(
where
The pseudo-code of the hybrid DBO-QL algorithm proposed in this paper is Algorithm 1.
1: Initialize the population and parameters for the DBO algorithm. 2: ) 3: 4: 5: = rand(1); 6: 7: = rand(1); 8: ) 9: = 1; 10: 11: = −1; 12: 13: Update the ball-rolling dung beetle's position using Equation (21); 14: 15: Update the ball-rolling dung beetle's position using Equation (22); 16: 17: 18: 19: 20: (the brood ball number) 21: Update the brood ball's position using Equation (24); 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: Update the small dung beetle's position using Equation (26); 34: 35: 36: Update the thief's position using Equation (27); 37: 38: 39: 40: Update it; 41: 42: 43: 44: 45: matrix and the parameters for the Q-learning algorithm. 46: 47: Obtain the initial state 48: Use the – 49: from the environment; 50: by using Equation (29); 51: Update the next state 52: 53:
To highlight the effectiveness, superiority, and timeliness of the proposed hybrid DBO-QL algorithm with respect to the joint optimization problem of jamming beam and jamming power allocation, this paper conducted simulation experiments concerning a complex scenario involving "multiple jammers against multiple radars". Within the same simulation environment, the hybrid DBO-QL algorithm was compared with the DBO algorithm and the PSO algorithm in terms of jamming benefit, algorithm response time, and optimization success rate. To ensure accuracy and fairness in the comparison, all the operational parameters of the jammers and radars were maintained consistently.
Assuming that at a certain moment, there are a total of M = 4 jammers conducting cooperative jamming tasks against N = 10 radars in the adversarial space, the radar parameter information and jammer parameter information sets during the simulation are presented in Table 1 and Table 2, respectively.
The spatial relationship between the jammers and radars is illustrated in Figure 6.
The spatial condition constraint matrix at this moment is presented in Table 3.
Other parameter settings for the adversarial scenario are provided in Table 4.
The specific parameter settings for each algorithm are as follows:
For the hybrid DBO-QL algorithm, weighting factors
(
where
(
where
The parameter settings for the DBO algorithm are as follows: weighting factors
The parameter settings for the PSO algorithm are as follows: Weighting factors
The convergence results for the hybrid DBO-QL algorithm are illustrated in Figure 8. It can be observed that the outer-layer DBO algorithm reached convergence around the 10th iteration, while the inner-layer Q-learning algorithm achieved convergence around the 9000th iteration.
When the algorithm converged, the jamming beam allocation matrix K and the jamming power allocation matrix P are presented in Table 5 and Table 6, respectively.
The jamming beam allocation matrix and jamming power allocation matrix shown in Table 5 and Table 6 represent the optimal cooperative jamming resource allocation scheme output by the algorithm. According to this scheme, out of the 10 radars in the adversarial scenario, 9 were allocated jamming beams for jamming. Taking the first jammer as an example, it was assigned to jam the sixth and tenth radars, with both jamming beams having a transmission power of 0.10 kW. Moreover, from Table 6, it can be observed that only the ninth radar was not assigned a jamming beam for jamming. This is due to the spatial condition constraint matrix (Table 3), wherein only the third jammer can be assigned to jam the ninth radar. However, referring to the radar parameter information and jammer parameter information in Table 1 and Table 2, it is evident that the operating frequency range of the ninth radar does not overlap with the jamming frequency range of the third jammer. Their frequency domain intersection is 0, indicating that the third jammer cannot effectively jam the ninth radar. Hence, no allocation is made, reflecting a scenario that may occur in practical situations.
To provide a more intuitive display of the results of cooperative jamming resource allocation with multiple jammers, the outputs of the outer-layer DBO algorithm and the inner-layer Q-learning algorithm are depicted separately in Figure 9 and Figure 10.
In order to evaluate the stability and adaptability of the algorithm with respect to solving the optimization problem with complex constraints, the optimization success rate
(
where
Under the same simulation scenario, model parameters, and hardware conditions, 200 Monte Carlo experiments were conducted on the hybrid DBO-QL algorithm, DBO algorithm, and PSO algorithm to obtain the optimal jamming benefit, average algorithm response time, and the optimization success rate, as shown in Table 7. The algorithm response time is calculated as follows: For swarm intelligence optimization algorithms, it constitutes the time elapsed from the start of the algorithm to reaching the convergence state. For the Q-learning algorithm, it consists of the time that has elapsed since the algorithm reached the convergence state, ranging from the next state input to the corresponding action output by the agent.
The simulation results validated the effectiveness, superiority, and timeliness of the proposed hybrid DBO-QL algorithm. Compared with the DBO and PSO algorithms, the proposed hybrid DBO-QL algorithm improved the jamming benefit by 3.03% and 6.25%, reduced the algorithm response time by 97.35% and 96.57%, and improved the optimization success rate by 26.33% and 50.26%, respectively. This is because, for the DBO algorithm and PSO algorithm, the increase in constraints entails an increase in algorithm optimization difficulty, so when facing problems with complex constraints, these algorithms often need to expand the number of populations to increase their own optimization success rates, but the expansion of the number of populations will bring about the prolongation of the algorithm response time; therefore, it is necessary to balance the relationship between the algorithm response time and the optimization success rate when using these algorithms. A benefit is provided by the hybrid DBO-QL algorithm's unique model, where the outer-layer DBO algorithm has a discrete solution space with only two values (0 and 1) for each dimension. This significantly simplified the optimization difficulty, resulting in improved convergence and stability and reduced time consumption. For the inner-layer Q-learning algorithm, once the agent is trained, it can rapidly select the optimal action for a given input state at different times based on the trained policy. Unlike the other two swarm intelligence optimization algorithms, the Q-learning algorithm does not require repetitive computations for each iteration, leading to a significant reduction in response time.
Table 7 also reveals that, compared to the PSO algorithm, the DBO algorithm provides a greater jamming benefit but at the cost of a longer response time. This is due to the fact that the DBO algorithm incorporates optimization strategies inspired by the rolling, dancing, foraging, stealing, and reproductive behaviors of dung beetles, which enhance its convergence effectiveness but prolong the algorithm response time.
This paper addresses the problem of cooperative jamming resource allocation with multiple jammers in a "many-to-many" scenario, proposing a method based on evolutionary reinforcement learning. This method comprehensively considers information from spatial, frequency, and energy domains to construct constraints and an objective function. It characterizes the jamming resource allocation scheme through the jamming beam allocation matrix and jamming power allocation matrix and optimizes these matrices using the outer-layer DBO algorithm and the inner-layer Q-learning algorithm, respectively. This approach achieves cooperative jamming resource allocation among multiple jammers by leveraging radar parameter information and jammer parameter information. In the simulation experiments, commonly used swarm intelligence optimization algorithms, specifically the DBO algorithm and the PSO algorithm, were selected for comparison in terms of jamming benefit, algorithm response time, and optimization success rate. The results demonstrate that the proposed algorithm outperforms the other two swarm intelligence optimization algorithms, obtaining higher jamming benefits and optimization success rates and showing significant advantages in algorithm response time.
DIAGRAM: Figure 1 Schematic diagram of the adversarial scenario.
Graph: Figure 2 Spectral overlap between the jammer and radar in the frequency domain: (a) fm1
Graph: Figure 3 The partitioning rules for the population.
Graph: Figure 4 DBO algorithm flow chart.
Graph: Figure 5 Cooperative jamming resource allocation with joint multi-domain information.
Graph: Figure 6 Visualization of the adversarial scenario.
Graph: Figure 7 Function plots of learning rate α and exploration factor ε : (a) function plot of learning rate α ; (b) function plot of exploration factor ε.
Graph: Figure 8 Convergence curve of the hybrid DBO-QL algorithm: (a) the convergence curve of the outer-layer DBO algorithm; (b) the convergence curve of the inner-layer Q-learning algorithm.
Graph: Figure 9 Visualization of DBO algorithm output.
Graph: Figure 10 Visualization of Q-learning algorithm output.
Table 1 Radar parameter information.
Radar ID Position/km Center Frequency/GHz Bandwidth/MHz Pulse Repetition Frequency/Hz Pulse Width/us Power/kW No.1 (83.2, 48.2, 0.1) 8.4 84.0 2616.0 5.0 154.3 No.2 (92.1, 1.2, 0.1) 8.8 80.0 1706.0 16.0 148.1 No.3 (75.9, 68.4, 0.0) 9.2 42.0 2337.0 8.0 162.0 No.4 (83.7, 24.9, 0.0) 6.7 100.0 844.0 45.0 159.0 No.5 (96.3, 14.2, 0.1) 6.6 90.0 1412.0 22.0 141.2 No.6 (79.2, 32.8, 0.1) 7.3 33.0 1920.0 15.0 165.5 No.7 (92.3, 98.5, 0.0) 8.2 62.0 855.0 43.0 162.4 No.8 (89.6, 61.8, 0.1) 9.4 50.0 2252.0 10.0 144.7 No.9 (73.8, 92.2, 0.1) 9.5 20.0 1262.0 24.0 142.6 No.10 (77.4, 84.9, 0.0) 6.4 92.0 1180.0 26.0 165.0
Table 2 Jammer parameter information.
Jammer ID Position/km Azimuth Angle/° Elevation Angle/° Jamming Frequency Range/GHz Power/kW No.1 (28.2, 31.1, 1.9) 0.5 45 (6.3, 8.2) 0.2 No.2 (3.6, 26.6, 2.2) 0 47.5 (6.6, 8.5) 0.2 No.3 (24.0, 77.5, 2.4) 0 42.5 (7.5, 9.4) 0.2 No.4 (10.0, 52.3, 1.8) −0.5 45 (7.9, 9.7) 0.2
Table 3 Spatial condition constraint matrix.
Radar 1 Radar 2 Radar 3 Radar 4 Radar 5 Radar 6 Radar 7 Radar 8 Radar 9 Radar 10 Jammer 1 1 0 0 1 0 1 1 0 0 1 Jammer 2 1 0 0 1 1 0 1 0 0 1 Jammer 3 0 1 1 1 0 1 0 1 1 0 Jammer 4 0 0 1 0 0 0 0 1 0 1
Table 4 Other parameter settings for the adversarial scenario.
Parameters Values Radar Antenna Gain /dB 30 Jammer Antenna Gain /dB 5 Polarization Matching Loss Coefficient between Jamming and Radar Signals 0.5 Target's Radar Cross Section Area /m2 1
Table 5 Jamming beam allocation matrix.
Radar 1 Radar 2 Radar 3 Radar 4 Radar 5 Radar 6 Radar 7 Radar 8 Radar 9 Radar 10 Jammer 1 0 0 0 0 0 1 0 0 0 1 Jammer 2 1 0 0 1 1 0 1 0 0 0 Jammer 3 0 1 0 0 0 0 0 0 0 0 Jammer 4 0 0 1 0 0 0 0 1 0 0
Table 6 Jamming power allocation matrix (Unit: kW).
Radar 1 Radar 2 Radar 3 Radar 4 Radar 5 Radar 6 Radar 7 Radar 8 Radar 9 Radar 10 Jammer 1 0.00 0.00 0.00 0.00 0.00 0.10 0.00 0.00 0.00 0.10 Jammer 2 0.03 0.00 0.00 0.05 0.05 0.00 0.07 0.00 0.00 0.00 Jammer 3 0.00 0.11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Jammer 4 0.00 0.00 0.10 0.00 0.00 0.00 0.00 0.10 0.00 0.00
Table 7 The evaluation results of different algorithms.
Evaluation Indicator Method DBO-QL DBO PSO Jamming Benefit 0.68 0.66 0.64 Algorithm Response Time/s 0.11 4.15 3.21 Optimization Success Rate/% 98.93 78.31 65.84
Conceptualization, Q.X. and T.C.; methodology, Q.X.; software, Q.X.; validation, Q.X., Z.X., and T.C.; formal analysis, Q.X. and Z.X.; investigation, Q.X. and T.C.; resources, T.C.; data curation, Q.X. and Z.X.; writing—original draft preparation, Q.X.; writing—review and editing, Q.X., Z.X., and T.C.; visualization, Q.X.; supervision, T.C.; project administration, T.C.; funding acquisition, T.C. All authors have read and agreed to the published version of the manuscript.
The data can be obtained by contacting the corresponding author.
The authors declare no conflicts of interest.
By Qi Xin; Zengxian Xin and Tao Chen
Reported by Author; Author; Author