Path to kafka in-depth research - The zk configuration of the kafka and zookeeper profiles in detail

Catalog
1/Zookeeper Profile Details
2/kafka Profile Parameters Detailed
3/Production environment zk and kafka profile notes
4/kafka command details

1/Complete installing zookeeper to detail its configuration file zookeeper-3.4.14.tar.gz
When installing zookeeper we are going to modify the file zoo_sample.cfg under the conf directory for zookeeper pre-installation. The first thing we need to do is rename the file
[hadoop@kafka01-55-11 conf]$ cp zoo_sample.cfg zoo.cfg
[hadoop@kafka01-55-11 conf]$ grep '^[a-Z]' zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
//above is the default profile parameter
Preloaded zoo.cfg has five attributes by default: 1.tickTime, 2.initLimit, 3.syncLimit, 4.dataDir, 5.clientPort

[hadoop@tencent-kafka01-39-110 conf]$ cat zoo.cfg 
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/data/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60                                           # Increase this value if you need to process more clients
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3                               # This parameter is used in conjunction with the above parameter, which specifies the number of files to keep.The default is to keep three.
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1                                  #This parameter specifies the cleaning frequency, in hours, and requires an integer of 1 or greater, which defaults to 1,
server.1=10.9.39.110:2888:3888
server.2=10.9.139.65:2888:3888
server.3=10.9.35.206:2888:3888
server.4=10.9.88.40:2888:3888
server.5=10.9.74.126:2888:3888
autopurge.snapRetainCount=20
autopurge.purgeInterval=5

//By default, tickTime=2sec, then minSessionTimeout and maxSessionTimeout are 4sec and 40sec, respectively

Explain as follows:

[hadoop@kafka01-55-11 conf]$ cat zoo_sample.cfg         //The zk default required parameters are configured as follows
# The number of milliseconds of each tick
tickTime=2000                                           # When tick is translated into Chinese, it means tick time, meaning heartbeat interval in milliseconds. The default system is 2000 milliseconds, which means two seconds heartbeat interval.Meaning of tickTime: Maintain a heartbeat between the client and the server or between the server and the server, that is, a heartbeat is sent every tickTime.The heartbeat is not only used to monitor the state of the machine, but also to control the communication time between Flower and Leader. By default, FL sessions are often twice the heartbeat interval.
# synchronization phase can take 
initLimit=10                                            # The maximum number of heartbeats (tickTime s) that can be tolerated at the initial connection between the follower server (F) and the leader server (L) in the cluster.
                                                        # clickhouse official document uses zk this initLimit=30000 
# The number of ticks that can pass between
# Send a request and get an acknowledgement Send the request and get confirmation
syncLimit=5                                             # The maximum number of heartbeats that can be tolerated for requests and promises between a flower server (F) and a leader (L) server in a cluster. 
                                                        # clickhouse official syncLimit=10
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/tmp/zookeeper                                  # The directory corresponding to this property is used to store myid information with some versions, logs, unique ID information with the server, and so on.
# the port at which the clients will connect
clientPort=2181                                    # The interface that the client connects to, the port that the client connects to the zookeeper server, which the zookeeper listens for and receives requests from the client!This port defaults to 2181.
# the maximum number of client connections.         # Maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60                                   # If you need to process more clients, increase this value to 2000 in the official zk clickhouse uses 
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir          # Number of snapshots to keep in dataDir
#autopurge.snapRetainCount=3                       # This parameter is used in conjunction with the following parameter, which specifies the number of files to keep.The default is to keep three. clickhouse Specify this value as 10                                                         #Keep 10 files
# Purge task interval in hours
# Set to "0" to disable auto purge feature Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1                        # This parameter specifies the cleaning frequency, in hours, and requires an integer of 1 or greater, which defaults to 1,
//explain
//The parameter autopurge.purgeInterval specifies the cleaning frequency in hours and requires an integer of 1 or greater, which defaults to 1,
//The parameter autopurge.snapRetainCount is used in conjunction with the above parameter, which specifies the number of files to be retained.The default is to keep three.
//By default, tickTime=2sec, then minSessionTimeout and maxSessionTimeout are 4sec and 40sec, respectively  

//There is no maxSessionTimeout=90000 # minSessionTimeout in the default parameter, maxSessionTimeout: Typically, when a client connects to zookeeper, a session timeout is set, and if the client is not connected to the zookeeper server beyond that time, the session is set to expire (if there is one on this session)Temporary nodes are deleted, but this time is not unlimited for the client. The server can set these two parameters to limit the range of client settings.15 minutes
# clickhouse officially recommends maxSessionTimeout = 60000000 1000 minutes, so we can consider increasing this parameter
zookeeper's Extended Configuration Advanced Configuration Item:
GlobalOutstandingLimit: This configuration specifies a limit on the maximum number of requests waiting to be processed (zookeeper.globalOutstandingLimit).
Clients may send requests faster than the server side, cause requests to queue on the server side, and eventually (in a few seconds) exhaust the server's memory.To avoid this, if the number of requests waiting reaches globalOutstandingLimit, the server side rejects the client's request.But this is not a hard limit.Each client can have at least one outstanding request or the connection will start to time out.Therefore, when the globalOutstandingLimit is reached, the server will only read data from the client connection if there are no pending requests.
To determine the limit for a given server, you can simply divide the value of this configuration item by the number of servers.There is no smart way to determine this value to restrict it, and overall, the value of this configuration item is the upper limit for outstanding requests.In fact, the load cannot be balanced among servers, and there are always servers with higher loads, even if the upper limit is not reached.
The default limit is 1000 requests.Usually you don't need to change this configuration. If many client s send very large requests, you need to lower this value, but in practice you don't usually need to change this value.

maxClientCnxns: Determines the maximum number of socket connections that can be initiated per IP address.
ZooKeeper uses flow control and limit to avoid connection overload.Connections consume much more resources than ordinary operations.Too many requests in a flash can cause a denial of service problem, so this restriction is added. When an IP connection exceeds this limit, the server will refuse the connection.The default value is 60.Recommendation 100
 clientPortAddress: The default server listens for all network interfaces to provide client s to connect to.Some servers have multiple network interfaces, some are internal and some are external.If you do not want to open an external network interface, you can set this configuration item as an internal network interface.
minSessionTimeout: This is the minimum timeout for session expiration in milliseconds.When a client initiates a connection, it requests a specific timeout, but the actual timeout can be less than this configuration item.
Developers like to detect client-side failures immediately and accurately.Unfortunately, the system can't detect it in real time, it actually uses heartbeat and timeout.The use of timeouts depends on network latency and reliability at the client and server ends.The timeout must be at least equal to the round trip time of the network, but there are occasional packet dropouts where the time to receive a response increases because the lost packets are sent.
The default minSessionTimeout is twice as large as tickTime.Setting this value too low will result in a failure of the wrong detection client.Setting it too high will delay the detection of client failures.//Do not generally consider this parameter
 maxSessionTimeout: This is the maximum session timeout in milliseconds.When a client initiates a connection, it requests a specific timeout, but the actual timeout can be greater than this configuration item.
Although this configuration does not affect system performance, it limits the time a client can consume system resources.The default is 20 times tickTime.TickTime=2s maxSessionTimeout=40s Default production configuration set to 90s
preAllocSize: 
Corresponding Java system properties: zookeeper.preAllocSize.
Configure the pre-allocated disk space size for ZooKeeper transaction log files.The default block size is 64M.One reason to change the block size is to reduce it appropriately when data snapshot files are generated more frequently.For example, 1,000 transactions will generate a new snapshot (parameter snapCount), which will be followed by a new transaction log file. Assuming a transaction information size of 100b, the pre-allocated disk space size of 100kb for the transaction log is better.
// clickhouse official value given is preAllocSize = 131072
snapCount
 Corresponding Java system property: zookeeper.snapCount.
ZooKeeper logs transactions to the transaction log.When snapCount transactions are written to a log file, start a snapshot and create a new transaction log file.The default snapCount value is 100,000
 // clickhouse official value given is snapCount = 3000000
leaderServes
 Corresponding Java system properties: zookeeper.leaderServes.
Used to configure whether Leader accepts client connections, the default value is "yes", which means Leader will accept client connections.In ZooKeeper, the Leader server primarily coordinates transaction update requests.In cases where transaction update request throughput is high and read request throughput is low, Leader can be configured not to accept client connections, allowing it to focus on coordination.
Note: Leader elections are recommended when the number of servers in the ZooKeeper cluster exceeds three.
// clickhouse official profile is leaderServes=yes open by default

ZooKeeper profile optimization performance (to) https://www.cnblogs.com/EasonJim/p/7488834.html  
//This link parameter is important and detailed

There is a hole here, which is
 server.1=10.2.10.174:2888:3888//Do not write IP here as server.1=emm-kafka01-10--174:2888:3888
 Why?
Writing host name, ZK can get up, kafka can get up, but when kafka parses ZK, he will not have problems later through etc/hosts
 In this case, we'll use IP, and here's what we'll notice

(2181 represents the port used by the client to connect to the server)
(2888 represents the port used for communication between leader and follower)
(of which 3888 is the port used for voting between follower s)

zookeeper's configuration parameters are detailed (zoo.cfg) https://www.orchome.com/1419 is good
#################################################################
grep '^[a-Z]' zoo.cfg          
tickTime=2000                      # When tick is translated into Chinese, it means tick time, meaning heartbeat interval in milliseconds. The default system is 2000 milliseconds, which means two seconds heartbeat interval.
tickTime Meaning: Maintain heartbeat between client and server or between server and server, that is, each tickTime Time sends a heartbeat.Heartbeat is not only used to monitor the working state of the machine, but also to control it. Flower with Leader Communication time, by default FL Sessions are often twice as frequent as the heartbeat interval.
initLimit=10                       # The maximum number of heartbeats (tickTime s) that can be tolerated at the initial connection between the follower server (F) and the leader server (L) in the cluster.
//Vernacular: Follower synchronizes all the latest data from the Leader during startup and then determines the starting state of its external services.Leader allows F to do this in initLimit time.
syncLimit=5                        # The maximum number of heartbeats that can be tolerated for requests and promises between a flower server (F) and a leader (L) server in a cluster.
//Vernacular: During run time, Leader is responsible for communicating with all the machines in the ZK cluster, such as detecting the survival status of the machines through some heartbeat detection mechanisms.If L sends a heartbeat packet after syncLimit and has not yet received a response from F, the F is considered offline.
dataDir=/data/zookeeper/dataDir     # The directory corresponding to this property is used to store myid information with some versions, logs, unique ID information with the server, and so on.
clientPort=2181                     # The interface that the client connects to, the port that the client connects to the zookeeper server, which the zookeeper listens for and receives requests from the client!This port defaults to 2181.
maxSessionTimeout=90000            # minSessionTimeout, maxSessionTimeout: Typically, when a client connects to zookeeper, a session timeout is set, and if the client is not connected to the zookeeper server beyond that time, the session is set to expire (if there are temporary nodes on the session, they are all deleted), butThe server can set these two parameters to limit the range of client settings.15 minutes
maxClientCnxns=60                  # Increase this value if you need to process more clients
server.1=10.9.39.110:2888:3888
server.2=10.9.139.65:2888:3888
server.3=10.9.35.206:2888:3888
server.4=10.9.88.40:2888:3888
server.5=10.9.74.126:2888:3888
autopurge.snapRetainCount=20        #Keep 20 files
autopurge.purgeInterval=5           #Keep logs for 5 hours    
//Here's our configuration: Keep logs for up to 5 hours, and keep 20 files Recommended to keep 48 hours on production and 20 files We keep online for 5 hours
//The parameter autopurge.purgeInterval specifies the cleaning frequency in hours and requires an integer of 1 or greater, which defaults to 1,
//The parameter autopurge.snapRetainCount is used in conjunction with the above parameter, which specifies the number of files to be retained.The default is to keep three.
//By default, tickTime=2sec, then minSessionTimeout and maxSessionTimeout are 4sec and 40sec, respectively 
#################################################################

Tags: Big Data Zookeeper Session network kafka

Posted on Wed, 11 Sep 2019 10:14:28 -0700 by xploita