Zum Hauptinhalt springen

Joint Client Selection and CPU Frequency Control in Wireless Federated Learning Networks with Power Constraints.

Zhou, Z ; Shi, S ; et al.
In: Entropy (Basel, Switzerland), Jg. 25 (2023-08-09), Heft 8
Online academicJournal

Joint Client Selection and CPU Frequency Control in Wireless Federated Learning Networks with Power Constraints 

Federated learning (FL) represents a distributed machine learning approach that eliminates the necessity of transmitting privacy-sensitive local training samples. However, within wireless FL networks, resource heterogeneity introduces straggler clients, thereby decelerating the learning process. Additionally, the learning process is further slowed due to the non-independent and identically distributed (non-IID) nature of local training samples. Coupled with resource constraints during the learning process, there arises an imperative need for optimizing client selection and resource allocation strategies to mitigate these challenges. While numerous studies have made strides in this regard, few have considered the joint optimization of client selection and computational power (i.e., CPU frequency) for both clients and the edge server during each global iteration. In this paper, we initially define a cost function encompassing learning latency and non-IID characteristics. Subsequently, we pose a joint client selection and CPU frequency control problem that minimizes the time-averaged cost function subject to long-term power constraints. By utilizing Lyapunov optimization theory, the long-term optimization problem is transformed into a sequence of short-term problems. Finally, an algorithm is proposed to determine the optimal client selection decision and corresponding optimal CPU frequency for both the selected clients and the server. Theoretical analysis provides performance guarantees and our simulation results substantiate that our proposed algorithm outperforms comparative algorithms in terms of test accuracy while maintaining low power consumption.

Keywords: federated learning; client selection; CPU frequency control; wireless networks

1. Introduction

As the rapid expansion of Internet of Things (IoT) communication unfolds, an immense volume of data generated by massive machine-type devices circulates via wireless access technology. This advancement instigates the pervasive utilization of machine-learning-based applications in various aspects of people's everyday life, including smart transportation and smart healthcare. In the traditional centralized learning paradigm, the raw data of each device are initially uploaded to the edge server via a wireless channel, facilitating aggregation for subsequent processing and analysis. This methodology, however, potentially leads to privacy data leakage. Consequently, the exploration of machine learning mechanisms that protect user data privacy is of paramount significance. FL was proposed as a solution to address the limitations of traditional centralized machine learning methods in ensuring user data privacy [[1]]. This approach permits each device to partake in the collaborative training of a shared model, negating the need to share the device's proprietary data. Since its inception, federated learning has garnered considerable interest from both academia and industry, finding broad applications in fields such as mobile cloud computing [[2]], the industrial Internet of Things [[3]], and device-to-device communication [[4]].

Despite the significant advantages offered by federated learning, its application in wireless networks presents certain challenges that warrant attention. Firstly, the scarcity of wireless resources, such as channel bandwidth, limit the number of clients capable of participating in each learning iteration. Additionally, client selection for each iteration needs to be guided by the real-time state of the channel. Secondly, the computational power and power budget of devices and the edge server are finite, necessitating the optimization of computational power and power allocation throughout the multi-iteration learning process in FL, which is essential for both extending battery life and advancing green communication. A third challenge arises from the latency incurred during the federated learning process, which constrains its use in scenarios that are latency-sensitive. The latency is determined by the time required for straggler clients to upload the local model during each learning iteration, the time spent aggregating the model on the edge server, which is influenced by the transmission power and CPU frequency in each learning iteration. Furthermore, the heterogeneity of client behavior and the variable dynamics of wireless environments may lead to the acquisition of non-independent and identically distributed (non-IID) training data [[5]]. Some clients may hold data that significantly deviate from independent identically distributed (IID) training data, making it challenging for the model to generalize effectively across all clients.

1.1. Related Work

Ever since the proposition of federated learning, an extensive research effort has been devoted to improving its performance in wireless networks. This advancement primarily hinges on designing appropriate client selection schemes [[6]] or optimizing resource allocation [[8], [10]]. For instance, a client scheduling strategy based on channel and learning qualities was proposed in [[7]]. Ref. [[9]] investigated both the CPU frequency and transmission power control strategy of all IoT clients to minimize the energy consumption under latency requirement. The studies presented in [[11], [13]] focused on the optimization of the client selection process and resource allocation simultaneously, with the objective of improving federated learning performance. Specifically, Ref. [[11]] delved into a joint client selection and resource allocation problem, seeking to optimize the trade-off between the number of selected clients and the total energy consumption, while [[12]] focused on minimizing training loss while adhering to the constraints of delay and energy consumption. Additionally, Ref. [[13]] explored the process of client selection and resource allocation under the condition of non-IID data distributions.

The aforementioned studies were formulated in the context of individual global iterations, overlooking the interdependence between different iterations. This neglects the cumulative learning effect over multiple iterations, potentially yielding less effective learning models and limiting overall system performance. Hence, the need for long-term optimization that accounts for the interconnectedness of global iterations becomes evident. Numerous research efforts [[14], [16], [18]] have targeted long-term optimization in federated learning, focusing on various aspects of the problem. For instance, Ref. [[14]] sought to optimize the client selection process in each learning round with the aim of minimizing training latency under fairness constraints. The study presented in [[15]] introduced a dynamic scheduling mechanism aimed at optimizing the federated learning process, striking a balance between the enhancement of learning performance and the reduction of training latency. Ref. [[16]] focused on the optimization of radio transmission parameters and computation resources, attempting to minimize power consumption while upholding learning performance and latency constraints. Refs. [[17]] focused on client selection and bandwidth allocation under energy constraints in wireless FL networks. Specifically, the study in [[17]] aimed to maximize the weighted sum of selected clients, whereas [[18]] focused on minimizing the cost of time and accuracy. While the above works explored long-term optimization in federated learning, the optimization of the latency and the impact of its non-IID nature under the long-term power constraints of both clients and the server for FL have not been considered.

1.2. Contribution

In this paper, we consider a client selection and CPU frequency control problem in wireless FL networks. Different from the extant literature, our approach concurrently optimizes the selection of clients and CPU frequency for both clients and the edge server. The objective of the proposed problem is to minimize a predefined cost function, which incorporates latency and model robustness, under long-term power constraints. The main contributions of our work are as follows:

  • (1) We develop a comprehensive framework for the long-term client selection and CPU frequency control problem, taking into account the interdependence of different global iterations and long-term power consumption constraints for both clients and the server. The aim is to expedite the learning process by incorporating client and server latency, as well as the effect of the non-IID distribution of local training samples.
  • (2) Leveraging Lyapunov optimization theory, we transform the long-term problem into a set of per-iteration problems. We introduce an algorithm to tackle the per-iteration problem, accompanied by a theoretical performance guarantee.
  • (3) We conduct extensive experiments, inclusive of several comparative experiments. Simulation results demonstrate that our proposed algorithm can yield superior test accuracy while maintaining low power consumption.

The remainder of this paper is structured as follows. Our proposed framework's system model, along with the optimization problem formulation, is elucidated in Section 2. The solution via the Lyapunov optimization theory, is laid out in Section 3. Section 4 comprises the simulation results, showcasing the superiority of our proposed scheme. Finally, Section 5 concludes and discusses the paper.

2. System Model and Problem Formulation

The proposed federated learning framework is shown in Figure 1, consisting of a set of clients K={1,...,K} and a server, with K indicating the total number of clients. Each client kK possesses a local dataset Dk={(xi,yi)}i=1dk , wherein xi and yi denote the i-th sample and its associated ground-truth label of client k, respectively, and dk stands for the dataset size originating from qk label classes.

2.1. Learning Model

Assuming T global iterations, we adopt akt=1 to represent the selection of client k in global iteration t=0,...,T1 , with akt=0 otherwise. The client selection decisions are denoted by at=(a1t,...,aKt) . The server aims to construct a global model by minimizing the following global loss function:

(1) w*=argminwk=1K[dkfk(w)]k=1Kdk,

where fk(w) is the local loss function at client k. For instance, the loss function for linear regression is given by:

(2) fk(w)=12(xiTwyi)2.

The goal of the training process is to find the optimal model w* though iteration. A global iteration t consists of four steps:

  • (1) Each client shares its side's information. Subsequently, the server selects a group of clients and broadcasts the current global model wt to them.
  • (2) The selected clients execute a local iteration to update their local models wkt based on their respective datasets.
  • (3) The selected clients upload their newly updated local models to the server.
  • (4) The server aggregates all the received local models to establish a new global model, as represented by wt+1=k=1K(aktwkt)/k=1Kakt .
2.2. Power Consumption Model

In each global iteration, the selected clients engage in training and uploading models while the server aggregates these received models. This process contributes to power consumption. We represent the overall CPU frequency control decisions of the clients as ft=(f1t,...,fKt) , where fkt indicates the CPU frequency of client k in the global iteration t. Notably, if akt=0 , then fkt also equals zero. The power computation of the training model can be expressed as Pkt,tr=γ1(fkt)3akt , where γ1 denotes the capacitance coefficient of clients [[18]]. Let Pkt,up=pktakt denote the power spent for uploading the model; thus, the total power consumption of client k during the global iteration t is given by:

(3) Pkt=Pkt,tr+Pkt,up.

On the other hand, the CPU frequency of the server during the global iteration t is represented by frt , and the server's capacitance coefficient is denoted as γ2 . Consequently, the power consumption of the server during the global iteration t can be formulated as follows:

(4) Prt=γ2(frt)3.

2.3. Latency Model

Let m represent the number of local iterations in each global iteration, and ck stand for the number of CPU cycles necessary to process a sample from client k. The local training latency for selected client k in global iteration t can then be calculated as τkt,tr=mckdkakt/fkt , which will linearly decrease as the allocated local computing power fkt increases.

When the local training is finished, the selected clients upload their models to the server via orthogonal frequency-division multiple access (OFDMA). The total available bandwidth is denoted as B, and it is assumed that this bandwidth is equally allocated to the selected clients during the global iteration t. Consequently, the bandwidth allocated to a selected client k in global iteration t can be represented as

(5) bt=B/k=1Kakt.

The model size is represented as s; therefore, the latency for model uploading is given by

(6) τkt,up=saktbtlog2(1+hktpktN0bt),

where hkt denotes the channel gain between client k and the server during the global iteration t, which is assumed to be available at the transmitter side. N0 denotes the power spectral density of noise. The total latency of client k can be formulated as:

(7) τkt=τkt,tr+τkt,up.

At the server side, let τrt denote the latency of the server in global iteration t, which can be written as:

(8) τrt=ϕk=1Kaktfrt,

where ϕ is the quantity of processing cycles required to carry out a single summation operation [[16]].

We assume that the server starts aggregating after receiving all the local models of selected clients. Therefore, the learning latency of global iteration t is bottlenecked by the straggler clients and can be derived as:

(9) τt=maxk=1,...K(τkt)+τrt.

2.4. Cost Model

The non-IID nature of data introduces biases in the training process, which significantly impacts the accuracy of FL. As noted in [[13]], a larger number of label classes might result in a more robust trained model, and the non-IID nature could decrease when clients possess more label classes. In this paper, we use label classes qk to quantify the non-IID nature with an aim to minimize both the learning latency and accuracy degradation caused by it. However, reducing the latter could potentially increase the learning latency. Therefore, we propose a cost objective function Ut to balance the two goals during the global iteration t:

(10) Ut(at,ft,frt)=τtμk=1K(aktqk),

where μ is a price parameter, which turns the label classes into a cost form [[19]].

2.5. Problem Formulation

From the aforementioned discussion, we consider an optimization problem that minimizes the time-averaged cost function through joint client selection and CPU frequency control as follows:

(11) P1minG0,...,GT11Tt=0T1Ut(at,ft,frt)

(12) s.t.fkminfktfkmax,k,t,

(13) frminfrtfrmax,t,

(14) akt{0,1},k,t,

(15) 1Tt=0T1PktP¯k,k,

(16) 1Tt=0T1PrtP¯r.

where Gt=(at,ft,frt) is the optimization variables in global iteration t, t=0,1,...,T1 . Constraint (12) and (13) specify the CPU frequency range of each client and the server, respectively. Constraint (14) defines whether each client is selected or not. Constraint (15) guarantees that the average power consumption of each client is limited by P¯k , while constraint (16) guarantees that the average power consumption of the server is limited by P¯r . For clarity, in the following sections of this paper, we succinctly refer to the cost function introduced in Equation (10) as Ut .

3. Problem Solution and Algorithm Design

A direct resolution of problem P1 is not viable due to the time-averaged optimization objective and long-term power constraints. Therefore, in this paper, problem P1 is initially transformed into a per-iteration problem by utilizing Lyapunov optimization theory. Subsequently, this per-iteration problem is decomposed into two distinct subproblems: a CPU frequency control problem, which assumes fixed client selection decisions, and a client selection problem that operates under the optimal CPU frequency setting.

3.1. Problem Transformation via Stochastic Optimization Theory

The resolution of problem P1 necessitates comprehensive information, such as channel gain, pertaining to T global iterations. However, the unavailability of future information in the present moment presents a formidable challenge. To circumvent this issue, P1 is converted into a series of subproblems, the solutions for which do not rely on the knowledge of future iterations. This transformation is achieved through the application of Lyapunov optimization theory [[20]] and the introduction of virtual queue techniques. For each client, a virtual power deficit queue Zkt is established, with an initial condition of Zk0=0 , and updated at the end of each global iteration as follows:

(17) Zkt+1=max{PktP¯k+Zkt,0},

where Zkt encapsulates the disparity between power consumption and the long-term power constraint of client k over T iterations. A similar approach can be used to construct a virtual power deficit queue Yrt for the server, as depicted:

(18) Yrt+1=max{PrtP¯r+Yrt,0}.

To maintain the mean rate stability of the queues, we first establish a Lyapunov function in the following form:

(19) L(Θt)=12[k=1K(Zkt)2+(Yrt)2],

where Θt symbolizes all the virtual deficit queues. Then we formulate Lyapunov drift to measure the expected increase as of L(Θt) as follows:

(20) Δ(Θt)=E[L(Θt+1)L(Θt)|Θt].

With the objective of restricting the growth of virtual deficit queues and minimizing the cost function, the objective function is integrated into the Lyapunov drift. Consequently, the drift-plus-cost function is defined as follows:

(21) Δ(Θt)+VE[Ut|Θt],

where V serves as a control parameter that aids in balancing the trade-off between minimizing the objective function and adhering to the power constraints. An observation of (21) indicates that it solely involves the current iteration t, signifying that the original problem P1 can be transitioned into a real-time problem solved on a per-iteration basis. The application of Lyapunov optimization theory provides the following lemma regarding the upper bound of the drift-plus-cost function:

Theorem 1.

Assume PkmaxPkt for each client k, and PrmaxPrt for the server in global iteration t. The drift-plus-cost function satisfies:

(22) Δ(Θt)+VE[Ut|Θt]C1+k=1KZktE[PktP¯k|Θt]+YrtE[PrtP¯r|Θt]+VE[Ut|Θt],

where C1 is a finite constant, which satisfies C112k=1K(PkmaxP¯k)2+12(PrmaxP¯r)2 .

Proof.

The proof is given in Appendix A. □

By minimizing the upper bound in Equation (22), virtual deficit queue stability is achieved concurrently with cost function minimization. Upon excluding all constants (i.e., C, P¯kZkt , P¯rYrt ), problem P1 can be transformed into a per-iteration problem P2 :

(23) P2minGtk=1K(PktZkt)+PrtYrt+VUts.t.(12)(14).

3.2. Problem Solution

To simplify the complexity, Ut in Equation (21) is substituted with an upper bound U˜t=k=1Kτkt+τrtμk=1K(aktqkt) , derivable through the application of maxk=1,...K(τkt)k=1Kτkt . Consequently, the resolution of P2 can be reoriented towards the following problem:

(24) P3minGtk=1K(PktZkt)+PrtYrt+VU˜ts.t.(12)(14).

Problem P3 manifests as a mixed-integer problem and poses a significant challenge for direct resolution. However, given any at , the objective function of P3 transforms into a convex function with respect to the CPU frequency of the selected clients and the server, i.e., fkt and frt . Consequently, the optimal CPU frequencies for selected clients and the server can be efficiently procured as

(25) (fkt)*=fkmin,ifVmckdk3Zktγ14<fkminfkmax,ifVmckdk3Zktγ14>fkmaxVmckdk3Zktγ14,otherwise,

and

(26) (frt)*=frmin,ifVϕk=1Kakt3Yrtγ24<frminfrmax,ifVϕk=1Kakt3Yrtγ24>frmaxVϕk=1Kakt3Yrtγ24,otherwise,

respectively.

With the optimal CPU frequency established, the objective of problem P2 becomes a function of at and can consequently be transformed as follows:

(27) P4minatk=1K(PktZkt)+PrtYrt+VUts.t.fkt=(fkt)*,k,frt=(frt)*,(14).

A straightforward strategy to resolve P4 involves traversing all possible client selection scenarios and then selecting the scheme that minimizes the objective function. However, the complexity of this approach escalates rapidly with an increase in the total number of clients. Therefore, we introduce an efficient algorithm designed to address P4 in Algorithm 1. In this proposed algorithm, during each global iteration, clients with Ikt=PktZktVμqkt lower than 0 are included into the initial set X0t . Thereafter, considering that learning latency is determined by straggler clients, these |X0t| clients are incorporated one by one into the auxiliary selection set Xat in ascending order according to their total latency τkt , thereby generating |X0t| auxiliary selection sets. Here |·| signifies the count of elements within the set. These |X0t| auxiliary selection sets are subsequently accumulated in the client selection set Xt . We then compute the value of the objective function of P4 for each auxiliary selection set within Xt and select the optimal auxiliary selection set (Xat)* that minimizes the objective function of P4 . Utilizing our proposed algorithm, throughout each global iteration, only |X0t| computations of the objective function are required to attain the optimal solution. Consequently, this represents a significantly lower complexity compared to the exhaustive traversal method.

Algorithm 1 Client Selection Algorithm

1: Input:Zkt=0,k,Yrt=0

2: Set X0t=,Xat=,Xt=

3: forkKdo

4: Calculate Ikt=PktZktVμqkt

5: ifIkt0then

6: X0t=X0t{k}

7: end if

8: end for

9: Rank the clients in X0t according to their τkt. Therefore we have τ1tτ2t...τ|X0t|t

10: forxX0tdo

11: Update Xat=Xat{x}

12: Add Xt to Xt , i.e., Xt=Xt{Xt}

13: Calculate J(Xat)=kXat(PktZkt)+PrtYrt+VUt

14: end for

15: Find (Xat)*=argminXatXt(J(Xat))

16: Return (at)* , where (akt)*=1{k(Xat)*},k

3.3. Analysis of the Proposed Optimization Scheme's Optimality

Given the trade-off between minimizing the time-averaged cost and reducing power consumption violations, the analysis of the proposed optimization strategy's optimality is provided herein.

Theorem 2.

The average cost function satisfies:

(28) 1Tt=0T1E[Ut|Θt]C2V+φ*,

where C2C1+k=1KZkt,maxmax(PkmaxP¯k)+Yrt,maxmax(PrmaxP¯r) , and φ* is the optimal solution of problem P1 .

Proof.

The proof is given in Appendix B. □

Theorem 3.

Assume E[Ut|Θt]φmin . The power consumption of each client k and the server are bounded by TP¯k+2TC2+2TVφ*2TVφmin and TP¯r+2TC2+2TVφ*2TVφmin , respectively.

Proof.

The proof is given in Appendix C. □

Theorem 2 elucidates that the discrepancy between the objective value yielded by the proposed algorithm and the original optimal value is less than or equal to O(1/V) . This suggests that the cost determined by the proposed optimization scheme can approximate the original optimal value to an arbitrary degree through the augmentation of the control parameter V. In accordance with Theorem 3, the energy deficit queues of all clients and the server adhere to an upper limit of O(V) at any iteration, a limit that escalates in accordance with the control parameter V. Nonetheless, an excessively large value of V may result in an unduly large upper boundary for the virtual power deficit queue backlog, which could lead to power consumption surpassing the power budget. In summary, the proposed algorithm delivers a [O(1/V),O(V)] trade-off between cost and power consumption, a balance that can be managed by adjusting the parameter V.

Theorem 4.

Virtual queue of each client k and the server satisfies:

(29) limTZkTT=0,limTYrTT=0.

Proof.

The proof is given in Appendix D. □

Theorem 4 indicates that the virtual power deficit queue backlog is bounded as the global iteration approaches infinity, i.e., all virtual queues remain mean rate stable across the FL iteration.

4. Experiment Result and Analysis

4.1. Experiment Settings

In the conducted experiment, FL was implemented using PyTorch, considering a system setup in which K clients are randomly positioned within a circular area of a 500 m radius with a central server. The path loss model is defined as 128.1+37.6log10i+ψ , where i represents the distance between a client and the server in kilometers, while ψ is a Gaussian random variable exhibiting a variance of 8 dB. The total bandwidth, B, is set to 100 MHz, with the noise power spectral density N0=174 dBm/Hz.

The power used for uploading the local model is arbitrarily assigned between 10 and 20 dBm. The model size s is set as 1 Mbit. For all clients, the number of local iterations in each global iteration m is set to 1. The number of CPU cycles necessary for processing a sample per client is randomly distributed within the range of [1,3]×104 cycles/sample. Average power constraints are established at P¯k=100 mW and P¯r=500 mW. The decision parameter V is assigned the value of 10, with a justification provided later. The CPU frequency range of the clients and the central server, fkt and frt , span from 0.1 GHz to 2.5 GHz and from 0.1 GHz to 3.3 GHz, respectively. Furthermore, the capacitance coefficients for the clients and the server, the price parameter of the cost function, and the number of CPU cycles needed to perform a single summation are set to γ1=γ2=1028 , μ=1.6×103 , and ϕ=106 .

The MNIST dataset [[21]] was employed for the experiment, consisting of 60,000 training samples and 10,000 test samples with 10 label classes from 0 to 9. Each client's local dataset was assembled by randomly selecting one or two label classes from the MNIST dataset with dk=100 samples. A multi-layer perceptron (MLP) model with a single hidden layer containing 64 nodes was utilized, with ReLU as the activation function. The learning rate was set to 0.01, and the batch size was 10.

To demonstrate the advantage of our proposed algorithm, we introduce the following three algorithms as comparison benchmarks:

  • Selected All: In this algorithm, all the clients are selected in each global iteration. The CPU frequency for both the clients and the central server is consistently set at their maximum values in every global iteration.
  • Greedy: For a rational comparison with our proposed algorithm, the long-term average number of client selected per round is tuned to be consistent with that of our proposed algorithm in this comparative algorithm. As such, we establish a client selection latency threshold. Clients are subsequently chosen one by one in ascending order based on their individual total latency τkt until the learning latency τt surpasses the preset client-selection latency threshold. Furthermore, with the prerequisite of adhering to the CPU frequency constraint, all participating clients and servers maintain a constant power level, identical to the long-term power constraint.
  • Random: In this comparative algorithm, clients are randomly selected in each round. The number of clients selected is maintained at a constant value, which is equal to the average number per round in our proposed algorithm. Aside from this variation, all other configurations align with those of the Greedy algorithm.
4.2. Analysis of Experimental Results

Conceptually, reducing the time required for each global iteration and minimizing the impact of the non-IID nature on the model convergence speed enables the training model to reach a specific accuracy more rapidly within a given learning time. Figure 2 demonstrates how the test accuracy of our proposed algorithm and comparative algorithms varies with the learning time under the number of clients K=100 . It is apparent that the proposed algorithm exhibits a performance almost equivalent to the Selected All algorithm in terms of convergence speed. Even though all the clients participate in each global iteration, fostering a swift convergence speed, the effects of the non-IID nature stemming from each client's dataset cannot be negated, thereby undermining its performance. Conversely, in our proposed model, clients with more label classes in their dataset may inherently have a higher selection priority. Simultaneously, the mean rate stable properties of the virtual queue in the proposed algorithm ensure fairness for clients with fewer label classes. Our proposed model significantly outperforms both the Greedy and Random algorithms. In the Greedy algorithm, while the impact of straggler clients is mitigated, it does not address the influence of the non-IID nature. Since the presence of the non-IID nature and straggler clients are not taken into account, the convergence speed of the Random algorithm is impeded.

Figure 3 demonstrates the corresponding average power consumption of the client side and the server side in each global iteration. The Selected All algorithm, when compared to other methods, exhibits substantially larger power consumption, primarily because it lacks a power constraint. However, due to the mean rate stable characteristic of the virtual queue, as demonstrated by Theorem 4, the power consumption under our proposed algorithm adheres to the long-term power constraint. Notably, the average power consumption under our proposed algorithm is approximately similar to that observed in the Greedy and Random algorithms.

To further validate the proposed optimization scheme, its performance is examined under a varying total number of clients K, as depicted in Figure 4 and Table 1. In the conducted experiment, we take into account the average test accuracy during the concluding half second when the learning time spans 30 seconds. Accompanying the increase in the total number of clients, the average number of clients selected per iteration also escalates, which, under normal circumstances, should enhance test accuracy. Nonetheless, the increase in the number of clients may cause a corresponding increment in each iteration's training duration. As a consequence, the total iterations that can be carried out within a fixed time duration may decrease, thereby potentially reducing accuracy. Hence, the test accuracy does not bear a linear relationship with the total number of clients, which can be observed from Figure 4. Nevertheless, our proposed algorithm continues to surpass comparative algorithms, as it adeptly manages non-IID characteristics and straggler clients. Furthermore, our proposed algorithm exhibits a consistent ability to maintain low power consumption as the total number of clients increases. This finding aligns with our previous analysis, demonstrating that the virtual power deficit queues are mean rate stable in our proposed algorithm.

Figure 5 depicts the variation in the average power consumption and the average cost in correlation with the control parameter V within our proposed optimization scheme. A clear observation is that the average cost experiences a decrease, while the average power consumption undergoes an increase with an escalating control parameter V. This aligns with the [O(1/V),O(V)] cost–power trade-off indicated by Theorems 2 and 3. Figure 6 illustrates the variation in the optimal average number of selected clients per iteration with the control parameter V. As previously stated, a control parameter value of V=10 was selected for the experiment. This choice was made because it yields an appropriate average number of selected clients to effectively address the accuracy degradation incited by non-IID characteristics, along with the low average cost and power consumption.

5. Discussion

In this paper, we explored a problem involving the selection of clients and the concurrent control of the CPU frequency for both the selected clients and the server within wireless FL networks. Lyapunov optimization theory was applied to transform the original problem into a per-iteration problem, which facilitated the design of an algorithm for problem resolution. Theoretical analysis offers performance guarantees, wherein controlling the parameter V empowers us to reduce cost while minimizing power consumption. Simulation results demonstrated that the proposed algorithm outperforms benchmark algorithms in terms of test accuracy by mitigating the impact of non-IID characteristics and straggler clients. By managing the virtual queues, the proposed algorithm was able to adhere to long-term power constraints. Furthermore, the simulation results verified that our proposed algorithm successfully realized the [O(1/V),O(V)] cost–power trade-off.

It is noteworthy that this study is currently confined to a simple star network topology. Expanding our analysis to encompass more intricate network structures such as hierarchical networks and multi-base station networks would undoubtedly enhance its applicability. Additionally, in practical wireless networks, client participation in learning can be affected by factors such as mobility, network congestion, or power availability fluctuations, potentially leading to client dropouts from the FL process and thereby impacting overall learning performance. Hence, the implications of client dropouts merit further investigation.

Figures and Table

Graph: Figure 1 Federated learning framework in wireless networks.

Graph: Figure 2 Test accuracy versus learning latency with the number of clients K=100.

Graph: Figure 3 Average power consumption of each client and the server. (a) Each client. (b) Server.

Graph: Figure 4 Average of test accuracy versus total number of clients.

Graph: Figure 5 The impact of V. (a) Average power consumption of clients and average cost versus control parameter V. (b) Average power consumption of the server and average cost versus control parameter V.

Graph: Figure 6 Average number of selected clients versus control parameter V.

Table 1 Average power consumption of clients and the server versus total number of clients.

Average Power ConsumptionTotal Number of ClientsProposedSelected AllGreedyRandom
clients707016.21 mW113,212.66 mW5616.25 mW5600.00 mW
808020.03 mW129,386.61 mW6401.84 mW6400.00 mW
909033.43 mW145,559.47 mW7229.53 mW7200.01 mW
10010,036.40 mW161,733.26 mW7743.76 mW7800.00 mW
11011,030.69 mW177,907.72 mW8439.06 mW8400.01 mW
12012,053.63 mW194,080.37 mW9336.41 mW9300.01 mW
13013,047.53 mW210,254.14 mW9940.70 mW9900.01 mW
server70499.86 mW3593.70 mW500.00 mW500.00 mW
80499.94 mW3593.70 mW500.00 mW500.00 mW
90499.94 mW3593.70 mW500.00 mW500.00 mW
100499.99 mW3593.70 mW500.00 mW500.00 mW
110500.03 mW3593.70 mW500.00 mW500.00 mW
120500.04 mW3593.70 mW500.00 mW500.00 mW
130500.24 mW3593.70 mW500.00 mW500.00 mW

Author Contributions

Conceptualization, Z.Z.; methodology, Z.Z., Y.L. and F.W.; software, Z.Z. and S.S.; validation, Z.Z., Y.L. and S.S.; formal analysis, Z.Z., Y.L. and F.W.; investigation, Z.Z., Y.L. and Y.Z.; writing—original draft preparation, Z.Z.; writing—review and editing, Y.L. and F.W. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FLFederated learning
IoT Internet of Things
non-IIDNon-independent and identically distributed
IIDIndependent identically distributed
OFDMAOrthogonal frequency-division multiple access
MLPMulti-layer perceptron
ReLURectified Linear Unit

Appendix A

Since Δ(Θt) plays a crucial role in the proof, we first bound Δ(Θt) . Plugging (17)–(19) into Δ(Θt) , we have:

Δ(Θt)=12E[k=1K(Zkt+1)2+(Yrt+1)2k=1K(Zkt)2(Yrt)2|Θt]=12E[k=1K{max{PktP¯k+Zkt,0}}2k=1K(Zkt)2+{max{PrtP¯r+Yrt,0}}2(Yrt)2|Θt]12E[k=1K(PktP¯k+Zkt)2k=1K(Zkt)2+(PrtP¯r+Yrt)2(Yrt)2|Θt]C1+k=1KZktE[PktP¯k|Θt]+YrtE[PrtP¯r|Θt], (A1)

where the first inequality is due to {max{a,0}}2a2 . Thus, we have:

Δ(Θt)+VE[Ut|Θt]C1+k=1KZktE[PktP¯k|Θt]+YrtE[PrtP¯r|Θt]+VE[Ut|Θt], (A2)

This concludes the proof.

Appendix B

According to Appendix A, we have:

Δ(Θt)+VE[Ut|Θt]C1+k=1KZkt,maxmax(E[PkmaxP¯k|Θt])+Yrt,maxmax(E[PrmaxP¯r|Θt])+Vφ*C2+Vφ* (A3)

Summing (A3) over T global iterations, we obtain:

t=0T1VE[Ut|Θt]TC2+TVφ*t=0T1Δ(Θt). (A4)

Next we sum Δ(Θt) over T global iterations. The following formula can be derived from (20):

t=0T1Δ(Θt)=L(ΘT)L(Θ0). (A5)

Plugging (A5) into (A4), we have:

t=0T1VE[Ut|Θt]TC2+TVφ*L(ΘT)+L(Θ0). (A6)

Dividing by TV , using the fact that L(ΘT)0 , L(Θ0)=0 , we have:

1Tt=0T1E[Ut|Θt]C2V+φ*. (A7)

This concludes the proof.

Appendix C

Rearranging (A6), we obtain:

L(ΘT)L(Θ0)TC2+TVφ*t=0T1VE[Ut|Θt]. (A8)

Substituting E[Ut|Θt]φmin into (A8), we have:

L(ΘT)TC2+TVφ*TVφmin. (A9)

Due to (19), we obtain:

(ZkT)2+(YrT)2k=1K(ZkT)2+(YrT)22TC2+2TVφ*2TVφmin. (A10)

Thus, we have:

(ZkT)22TC2+2TVφ*2TVφmin,(YrT)22TC2+2TVφ*2TVφmin. (A11)

Next we bound the virtual power deficit queues. From (17) and (18), we have:

Zkt+1=Zktmin{P¯kPkt,Zkt},Yrt+1=Yrtmin{P¯rPrt,Yrt}. (A12)

Rearranging terms and summing both sides over t global iterations, we have:

ZktZk0=τ=0t1min{P¯kPkτ,Zkt}τ=0t1(PkτP¯k)=τ=0t1PkτtP¯k, (A13)

YrtYr0=τ=0t1min{P¯rPrτ,Ykt}τ=0t1(PrτP¯r)=τ=0t1PrτtP¯r. (A14)

Thus, we have:

ZkTτ=0T1PkτTP¯k,YrTτ=0T1PrτTP¯r. (A15)

Thus far, we have bounded the virtual power deficit queues. Plugging (A15) into (A11), we have:

τ=0T1PkτTP¯k+2TC2+2TVφ*2TVφmin, (A16)

τ=0T1PrτTP¯r+2TC2+2TVφ*2TVφmin. (A17)

This concludes the proof.

Appendix D

Rearranging terms of (A11), we have:

ZkT2TC2+2TVφ*2TVφmin,YrT2TC2+2TVφ*2TVφmin. (A18)

Dividing by T and taking limits of both sides, we have:

limTZkTT=0,limTYrTT=0. (A19)

This concludes the proof.

Footnotes 1 Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. References McMahan B., Moore E., Ramage D., Hampson S., y Arcas B.A. Communication-efficient learning of deep networks from decentralized data. Proceedings of the Artificial Intelligence and Statistics. Fort Lauderdale, FL, USA. 20–22 April 2017 2 Wang S., Tuor T., Salonidis T., Leung K.K., Makaya C., He T., Chan K. Adaptive federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas Commun. 2019; 37: 1205-1221. 10.1109/JSAC.2019.2904348 3 Gao W., Zhao Z., Min G., Ni Q., Jiang Y. Resource allocation for latency-aware federated learning in industrial internet of things. IEEE Trans. Ind. Inform. 2021; 17: 8505-8513. 10.1109/TII.2021.3073642 4 Hu R., Guo Y., Gong Y. Energy-efficient distributed machine learning at wireless edge with device-to-device communication. Proceedings of the ICC 2022-IEEE International Conference on Communications. Seoul, Republic of Korea. 16–20 May 2022 5 Zhao Z., Feng C., Hong W., Jiang J., Jia C., Quek T.Q., Peng M. Federated learning with non-iid data in wireless networks. IEEE Trans. Wirel. Commun. 2021; 21: 1927-1942. 10.1109/TWC.2021.3108197 6 Nishio T., Yonetani R. Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge. Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC). Shanghai, China. 20–24 May 2019 7 Leng J., Lin Z., Ding M., Wang P., Smith D., Vucetic B. Client scheduling in wireless federated learning based on channel and learning qualities. IEEE Wirel. Commun. Lett. 2022; 11: 732-735. 10.1109/LWC.2022.3141792 8 Yang Z., Chen M., Saad W., Hong C.S., Shikh-Bahaei M. Energy efficient federated learning over wireless communication networks. IEEE Trans. Wirel. Commun. 2020; 20: 1935-1949. 10.1109/TWC.2020.3037554 9 Yao J., Ansari N. Enhancing federated learning in fog-aided IoT by CPU frequency and wireless power control. IEEE Internet Things J. 2020; 8: 3438-3445. 10.1109/JIOT.2020.3022590 Zhao Z., Xia J., Fan L., Lei X., Karagiannidis G.K., Nallanathan A. System optimization of federated learning networks with a constrained latency. IEEE Trans. Veh. Technol. 2021; 71: 1095-1100. 10.1109/TVT.2021.3128559 Yu L., Albelaihi R., Sun X., Ansari N., Devetsikiotis M. Jointly optimizing client selection and resource management in wireless federated learning for internet of things. IEEE Internet Things J. 2021; 9: 4385-4395. 10.1109/JIOT.2021.3103715 Chen M., Yang Z., Saad W., Yin C., Poor H.V., Cui S. A joint learning and communications framework for federated learning over wireless networks. IEEE Trans. Wirel. Commun. 2020; 20: 269-283. 10.1109/TWC.2020.3024629 Liu S., Yu G., Chen X., Bennis M. Joint user association and resource allocation for wireless hierarchical federated learning with iid and non-iid data. IEEE Trans. Wirel. Commun. 2022; 21: 7852-7866. 10.1109/TWC.2022.3162595 Huang T., Lin W., Wu W., He L., Li K., Zomaya A.Y. An efficiency-boosting client selection scheme for federated learning with fairness guarantee. IEEE Trans. Parallel Distrib. Syst. 2020; 32: 1552-1564. 10.1109/TPDS.2020.3040887 Guo K., Chen Z., Yang H.H., Quek T.Q. Dynamic scheduling for heterogeneous federated learning in private 5g edge networks. IEEE J. Sel. Top. Signal Process. 2021; 16: 26-40. 10.1109/JSTSP.2021.3126174 Battiloro C., Di Lorenzo P., Merluzzi M., Barbarossa S. Lyapunov-based optimization of edge resources for energy-efficient adaptive federated learning. IEEE Trans. Green Commun. Netw. 2022; 7: 265-280. 10.1109/TGCN.2022.3186879 Xu J., Wang H. Client selection and bandwidth allocation in wireless federated learning networks: A long-term perspective. IEEE Trans. Wirel. Commun. 2020; 20: 1188-1200. 10.1109/TWC.2020.3031503 Ji Y., Kou Z., Zhong X., Li H., Yang F., Zhang S. Client Selection and Bandwidth Allocation for Federated Learning: An Online Optimization Perspective. Proceedings of the GLOBECOM 2022—2022 IEEE Global Communications Conference. Rio de Janeiro, Brazil. 4–8 December 2022 Ji X., Tian J., Zhang H., Wu D., Li T. Joint Device Selection and Bandwidth Allocation for Cost-Efficient Federated Learning in Industrial Internet of Things. IEEE Internet Things J. 2023; 10: 9148-9160. 10.1109/JIOT.2022.3233595 Neely M.J. Stochastic network optimization with application to communication and queueing systems. Synth. Lect. Commun. Netw. 2010; 3: 1-211 LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition. Proc. IEEE. 1998; 86: 2278-2324. 10.1109/5.726791

By Zhaohui Zhou; Shijie Shi; Fasong Wang; Yanbin Zhang and Yitong Li

Reported by Author; Author; Author; Author; Author

Titel:
Joint Client Selection and CPU Frequency Control in Wireless Federated Learning Networks with Power Constraints.
Autor/in / Beteiligte Person: Zhou, Z ; Shi, S ; Wang, F ; Zhang, Y ; Li, Y
Link:
Zeitschrift: Entropy (Basel, Switzerland), Jg. 25 (2023-08-09), Heft 8
Veröffentlichung: Basel, Switzerland : MDPI, 1999-, 2023
Medientyp: academicJournal
ISSN: 1099-4300 (electronic)
DOI: 10.3390/e25081183
Sonstiges:
  • Nachgewiesen in: MEDLINE
  • Sprachen: English
  • Publication Type: Journal Article
  • Language: English
  • [Entropy (Basel)] 2023 Aug 09; Vol. 25 (8). <i>Date of Electronic Publication: </i>2023 Aug 09.
  • Grant Information: 61801433,62101505 National Natural Science Foundation of China; 222102210003, 222102210278 Science and Technology Planning Project of Henan Province
  • Contributed Indexing: Keywords: CPU frequency control; client selection; federated learning; wireless networks
  • Entry Date(s): Date Created: 20230826 Latest Revision: 20230829
  • Update Code: 20240514
  • PubMed Central ID: PMC10453361

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -