Hadoop源码分析三启动及脚本剖析

网友投稿 808 2022-12-12

Hadoop源码分析三启动及脚本剖析

Hadoop源码分析三启动及脚本剖析

1、 启动

hadoop的启动是通过其sbin目录下的脚本来启动的。与启动相关的叫脚本有以下几个:

start-all.sh、start-dfs.sh、start-yarn.sh、hadoop-daemon.sh、yarn-daemon.sh。

hadoop-daemon.sh是用来启动与hdfs相关的服务的

yarn-daemon.sh是用来启动和yarn相关的服务

start-dfs.sh是用来启动hdfs集群的

start-yarn.sh是用来启动yarn集群

start-all.sh是用来启动yarn和hdfs集群的

这几个start开头的脚本都是通过调用那两个daemon脚本来启动的。

2、 脚本分析

这里先从start-all.sh开始分析,然后逐步分析其脚本的调用。

start-all.sh脚本内容如下:

#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements. See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License. You may obtain a copy of the License at

#

# http://apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Start all hadoop daemons. Run this on master node.

echo "This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh"

bin=`dirname "${BASH_SOURCE-$0}"`

bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

# start hdfs daemons if hdfs is present

if [ -f "${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh ]; then

"${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh --config $HADOOP_CONF_DIR

fi

# start yarn daemons if yarn is present

if [ -f "${HADOOP_YARN_HOME}"/sbin/startImjxFzypZj-yarn.sh ]; then

"${HADOOP_YARN_HOME}"/sbin/start-yarn.sh --config $HADOOP_CONF_DIR

fi

这个脚本的重点在第31行到末尾,这里是两个if语句,第一个if语句里(第34行)调用的是start-dfs.sh脚本,第二个if语句里(第37行)调用的是start-yarn.sh。

然后我们以start-dfs.sh为例,继续向下分析。

start-dfs.sh的内容如下:

#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements. See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License. You may obtain a copy of the License at

#

# http://apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Start hadoop dfs daemons.

# Optinally upgrade or rollback dfs state.

# Run this on master node.

usage="Usage: start-dfs.sh [-upgrade|-rollback] [other options such as -clusterId]"

bin=`dirname "${BASH_SOURCE-$0}"`

bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hdfs-config.sh

# get arguments

if [[ $# -ge 1 ]]; then

startOpt="$1"

shift

case "$startOpt" in

-upgrade)

nameStartOpt="$startOpt"

;;

-rollback)

dataStartOpt="$startOpt"

;;

*)

echo $usage

exit 1

;;

esac

fi

#Add other possible options

nameStartOpt="$nameStartOpt $@"

#---------------------------------------------------------

# namenodes

NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)

echo "Starting namenodes on [$NAMENODES]"

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

--config "$HADOOP_CONF_DIR" \

--hostnames "$NAMENODES" \

--script "$bin/hdfs" start namenode $nameStartOpt

#---------------------------------------------------------

# datanodes (using default slaves file)

if [ -n "$HADOOP_SECURE_DN_USER" ]; then

echo \

"Attempting to start secure cluster, skipping datanodes. " \

"Run start-secure-dns.sh as root to complete startup."

else

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

--config "$HADOOP_CONF_DIR" \

--script "$bin/hdfs" start datanode $dataStartOpt

fi

#---------------------------------------------------------

# secondary namenodes (if any)

SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes 2>/dev/null)

if [ -n "$SECONDARY_NAMENODES" ]; then

echo "Starting secondary namenodes [$SECONDARY_NAMENODES]"

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

--config "$HADOOP_CONF_DIR" \

--hostnames "$SECONDARY_NAMENODES" \

--script "$bin/hdfs" start secondarynamenode

fi

#---------------------------------------------------------

# quorumjournal nodes (if any)

SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir 2>&-)

case "$SHARED_EDITS_DIR" in

qjournal://*)

JOURNAL_NODES=$(echo "$SHARED_EDITS_DIR" | sed 's,qjournal://\([^/]*\)/.*,\1,g; s/;/ /g; s/:[0-9]*//g')

echo "Starting journal nodes [$JOURNAL_NODES]"

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

--config "$HADOOP_CONF_DIR" \

--hostnames "$JOURNAL_NODES" \

--script "$bin/hdfs" start journalnode ;;

esac

#---------------------------------------------------------

# ZK Failover controllers, if auto-HA is enabled

AUTOHA_ENABLED=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.ha.automatic-failover.enabled)

if [ "$(echo "$AUTOHA_ENABLED" | tr A-Z a-z)" = "true" ]; then

echo "Starting ZK Failover Controllers on NN hosts [$NAMENODES]"

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

--config "$HADOOP_CONF_DIR" \

--hostnames "$NAMENODES" \

--script "$bin/hdfs" start zkfc

fi

# eof

首先是第23行到30行,这里是在处理hadoop的路径。

然后是第32行到51行是在处理脚本传入的参数。最后就启动hdfs的各个角色了。

首先启动的是namenode,在第60行到63行,这段代码是调用了hadoop-daemons.sh脚本来启动,这个脚本和之前提到的hadoop-daemon.sh脚本的区别在于这个脚本可以在集群的其他机器上启动该启动的角色,而hadoop-daemon.sh只能启动当前机器上的角色。其实hadoop-daemons.sh也是通过调用hadoop-daemon.sh来启动的,这个稍后再分析。这个脚本还有几个参数,其中最重要的是:start namenode。它表示启动namenode。

然后是第65行到第76行,这里是在启动datanode,启动的方式和namenode相同。启动DataNode的代码在第73行。

然后是第78行到第90行,这里在启动secondarynamenode。如果是配置了namenode的高可用,secondarynamenode便不会启动。

然后是第92行到第105行,这里在启动journalnode。如果配置了namenode的高可用,journalnode才会启动。

最后是第107行到末尾,这里启动的是zkfc。同样这个也是要配置高可用才会启动。

如果按照文档(2)中配置的高可用来看,这里启动的角色应该为:namenode、datanode、journalnode、zkfc。

启动上述角色调用的hadoop-daemons.sh脚本内容如下:

#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements. See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License. You may obtain a copy of the License at

#

# http://apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Run a Hadoop command on all slave hosts.

usage="Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."

# if no args specified, show usage

if [ $# -le 1 ]; then

echo $usage

exit 1

fi

bin=`dirname "${BASH_SOURCE-$0}"`

bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

这个脚本的重点在最后一行。这里有两个脚本slaves.sh和hadoop-daemon.sh。slaves.sh脚本会使用ssh登录到指定的服务器中然后执行其中的hadoop-daemon.sh脚本。这里就不分析slaves.sh脚本了。

我们继续看hadoop-daemon.sh脚本。

其内容如下:

#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more

# contImjxFzypZjributor license agreements. See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License. You may obtain a copy of the License at

#

# http://apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Runs a Hadoop command as a daemon.

#

# Environment Variables

#

# HADOOP_CONF_DIR Alternate conf dir. Default is ${HADOOP_PREFIX}/conf.

# HADOOP_LOG_DIR Where log files are stored. PWD by default.

# HADOOP_MASTER host:path where hadoop code should be rsync'd from

# HADOOP_PID_DIR The pid files are stored. /tmp by default.

# HADOOP_IDENT_STRING A string representing this instance of hadoop. $USER by default

# HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.

##

usage="Usage: hadoop-daemon.sh [--config ] [--hosts hostlistfile] [--script script] (start|stop) "

# if no args specified, show usage

if [ $# -le 1 ]; then

echo $usage

exit 1

fi

bin=`dirname "${BASH_SOURCE-$0}"`

bin=`cd "$bin"; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

# get arguments

#default value

hadoopScript="$HADOOP_PREFIX"/bin/hadoop

if [ "--script" = "$1" ]

then

shift

hadoopScript=$1

shift

fi

startStop=$1

shift

command=$1

shift

hadoop_rotate_log ()

{

log=$1;

num=5;

if [ -n "$2" ]; then

num=$2

fi

if [ -f "$log" ]; then # rotate logs

while [ $num -gt 1 ]; do

prev=`expr $num - 1`

[ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"

num=$prev

done

mv "$log" "$log.$num";

fi

}

if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then

. "${HADOOP_CONF_DIR}/hadoop-env.sh"

fi

# Determine if we're starting a secure datanode, and if so, redefine appropriate variables

if [ "$command" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then

export HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR

export HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR

export HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER

starting_secure_dn="true"

fi

#Determine if we're starting a privileged NFS, if so, redefine the appropriate variables

if [ "$command" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then

export HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR

export HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR

export HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER

starting_privileged_nfs="true"

fi

if [ "$HADOOP_IDENT_STRING" = "" ]; then

export HADOOP_IDENT_STRING="$USER"

fi

# get log directory

if [ "$HADOOP_LOG_DIR" = "" ]; then

export HADOOP_LOG_DIR="$HADOOP_PREFIX/logs"

fi

if [ ! -w "$HADOOP_LOG_DIR" ] ; then

mkdir -p "$HADOOP_LOG_DIR"

chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR

fi

if [ "$HADOOP_PID_DIR" = "" ]; then

HADOOP_PID_DIR=/tmp

fi

# some variables

export HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.log

export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:-"INFO,RFA"}

export HADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:-"INFO,RFAS"}

export HDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:-"INFO,NullAppender"}

log=$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out

pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid

HADOOP_STOP_TIMEOUT=${HADOOP_STOP_TIMEOUT:-5}

# Set default scheduling priority

if [ "$HADOOP_NICENESS" = "" ]; then

export HADOOP_NICENESS=0

fi

case $startStop in

(start)

[ -w "$HADOOP_PID_DIR" ] || mkdir -p "$HADOOP_PID_DIR"

if [ -f $pid ]; then

if kill -0 `cat $pid` > /dev/null 2>&1; then

echo $command running as process `cat $pid`. Stop it first.

exit 1

fi

fi

if [ "$HADOOP_MASTER" != "" ]; then

echo rsync from $HADOOP_MASTER

rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_PREFIX"

fi

hadoop_rotate_log $log

echo starting $command, logging to $log

cd "$HADOOP_PREFIX"

case $command in

namenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc)

if [ -z "$HADOOP_HDFS_HOME" ]; then

hdfsScript="$HADOOP_PREFIX"/bin/hdfs

else

hdfsScript="$HADOOP_HDFS_HOME"/bin/hdfs

fi

nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &

;;

(*)

nohup nice -n $HADOOP_NICENESS $hadoopScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &

;;

esac

echo $! > $pid

sleep 1

head "$log"

# capture the ulimit output

if [ "true" = "$starting_secure_dn" ]; then

echo "ulimit -a for secure datanode user $HADOOP_SECURE_DN_USER" >> $log

# capture the ulimit info for the appropriate user

su --shell=/bin/bash $HADOOP_SECURE_DN_USER -c 'ulimit -a' >> $log 2>&1

elif [ "true" = "$starting_privileged_nfs" ]; then

echo "ulimit -a for privileged nfs user $HADOOP_PRIVILEGED_NFS_USER" >> $log

su --shell=/bin/bash $HADOOP_PRIVILEGED_NFS_USER -c 'ulimit -a' >> $log 2>&1

else

echo "ulimit -a for user $USER" >> $log

ulimit -a >> $log 2>&1

fi

sleep 3;

if ! ps -p $! > /dev/null ; then

exit 1

fi

;;

(stop)

if [ -f $pid ]; then

TARGET_PID=`cat $pid`

if kill -0 $TARGET_PID > /dev/null 2>&1; then

echo stopping $command

kill $TARGET_PID

sleep $HADOOP_STOP_TIMEOUT

if kill -0 $TARGET_PID > /dev/null 2>&1; then

echo "$command did not stop gracefully after $HADOOP_STOP_TIMEOUT seconds: killing with kill -9"

kill -9 $TARGET_PID

fi

else

echo no $command to stop

fi

rm -f $pid

else

echo no $command to stop

fi

;;

(*)

echo $usage

exit 1

;;

esac

这段代码的重点在第131行到结束。这里是真正在启动服务的代码,这个文件在调用的时候,会传入两个重要的参数start/stop xxx。用于启动或停止某些服务。以启动服务为例,其重点在第153行,这里会执行一个hdfsScript脚本。这个参数的定义在第155行,

这里可以看见它实际是hadoop的bin目录下的hdfs文件

文件的内容如下:

#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements. See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License. You may obtain a copy of the License at

#

# http://apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Environment Variables

#

# jsVC_HOME home directory of jsvc binary. Required for starting secure

# datanode.

#

# JSVC_OUTFILE path to jsvc output file. Defaults to

# $HADOOP_LOG_DIR/jsvc.out.

#

# JSVC_ERRFILE path to jsvc error file. Defaults to $HADOOP_LOG_DIR/jsvc.err.

bin=`which $0`

bin=`dirname ${bin}`

bin=`cd "$bin" > /dev/null; pwd`

DEFAULT_LIBEXEC_DIR="$bin"/../libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hdfs-config.sh

function print_usage(){

echo "Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND"

echo " where COMMAND is one of:"

echo " dfs run a filesystem command on the file systems supported in Hadoop."

echo " classpath prints the classpath"

echo " namenode -format format the DFS filesystem"

echo " secondarynamenode run the DFS secondary namenode"

echo " namenode run the DFS namenode"

echo " journalnode run the DFS journalnode"

echo " zkfc run the ZK Failover Controller daemon"

echo " datanode run a DFS datanode"

echo " dfsadmin run a DFS admin client"

echo " haadmin run a DFS HA admin client"

echo " fsck run a DFS filesystem checking utility"

echo " balancer run a cluster balancing utility"

echo " jmxget get JMX exported values from NameNode or DataNode."

echo " mover run a utility to move block replicas across"

echo " storage types"

echo " oiv apply the offline fsimage viewer to an fsimage"

echo " oiv_legacy apply the offline fsimage viewer to an legacy fsimage"

echo " oev apply the offline edits viewer to an edits file"

echo " fetchdt fetch a delegation token from the NameNode"

echo " getconf get config values from configuration"

echo " groups get the groups which users belong to"

echo " snapshotDiff diff two snapshots of a directory or diff the"

echo " current directory contents with a snapshot"

echo " lsSnapshottableDir list all snapshottable dirs owned by the current user"

echo " Use -help to see options"

echo " portmap run a portmap service"

echo " nfs3 run an NFS version 3 gateway"

echo " cacheadmin configure the HDFS cache"

echo " crypto configure HDFS encryption zones"

echo " storagepolicies list/get/set block storage policies"

echo " version print the version"

echo ""

echo "Most commands print help when invoked w/o parameters."

# There are also debug commands, but they don't show up in this listing.

}

if [ $# = 0 ]; then

print_usage

exit

fi

COMMAND=$1

shift

case $COMMAND in

# usage flags

--help|-help|-h)

print_usage

exit

;;

esac

# Determine if we're starting a secure datanode, and if so, redefine appropriate variables

if [ "$COMMAND" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then

if [ -n "$JSVC_HOME" ]; then

if [ -n "$HADOOP_SECURE_DN_PID_DIR" ]; then

HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR

fi

if [ -n "$HADOOP_SECURE_DN_LOG_DIR" ]; then

HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"

fi

HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"

starting_secure_dn="true"

else

echo "It looks like you're trying to start a secure DN, but \$JSVC_HOME"\

"isn't set. Falling back to starting insecure DN."

fi

fi

# Determine if we're starting a privileged NFS daemon, and if so, redefine appropriate variables

if [ "$COMMAND" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then

if [ -n "$JSVC_HOME" ]; then

if [ -n "$HADOOP_PRIVILEGED_NFS_PID_DIR" ]; then

HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR

fi

if [ -n "$HADOOP_PRIVILEGED_NFS_LOG_DIR" ]; then

HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"

fi

HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"

starting_privileged_nfs="true"

else

echo "It looks like you're trying to start a privileged NFS server, but"\

"\$JSVC_HOME isn't set. Falling back to starting unprivileged NFS server."

fi

fi

if [ "$COMMAND" = "namenode" ] ; then

CLASS='org.apache.hadoop.hdfshttp://.server.namenode.NameNode'

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"

elif [ "$COMMAND" = "zkfc" ] ; then

CLASS='org.apache.hadoop.hdfs.tools.DFSZKFailoverController'

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_ZKFC_OPTS"

elif [ "$COMMAND" = "secondarynamenode" ] ; then

CLASS='org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode'

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_SECONDARYNAMENODE_OPTS"

elif [ "$COMMAND" = "datanode" ] ; then

CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'

if [ "$starting_secure_dn" = "true" ]; then

HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"

else

HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"

fi

elif [ "$COMMAND" = "journalnode" ] ; then

CLASS='org.apache.hadoop.hdfs.qjournal.server.JournalNode'

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_JOURNALNODE_OPTS"

elif [ "$COMMAND" = "dfs" ] ; then

CLASS=org.apache.hadoop.fs.FsShell

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

elif [ "$COMMAND" = "dfsadmin" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

elif [ "$COMMAND" = "haadmin" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.DFSHAAdmin

CLASSPATH=${CLASSPATH}:${TOOL_PATH}

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

elif [ "$COMMAND" = "fsck" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.DFSck

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

elif [ "$COMMAND" = "balancer" ] ; then

CLASS=org.apache.hadoop.hdfs.server.balancer.Balancer

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_BALANCER_OPTS"

elif [ "$COMMAND" = "mover" ] ; then

CLASS=org.apache.hadoop.hdfs.server.mover.Mover

HADOOP_OPTS="${HADOOP_OPTS} ${HADOOP_MOVER_OPTS}"

elif [ "$COMMAND" = "storagepolicies" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.StoragePolicyAdmin

elif [ "$COMMAND" = "jmxget" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.JMXGet

elif [ "$COMMAND" = "oiv" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB

elif [ "$COMMAND" = "oiv_legacy" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer

elif [ "$COMMAND" = "oev" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer

elif [ "$COMMAND" = "fetchdt" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.DelegationTokenFetcher

elif [ "$COMMAND" = "getconf" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.GetConf

elif [ "$COMMAND" = "groups" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.GetGroups

elif [ "$COMMAND" = "snapshotDiff" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.snapshot.SnapshotDiff

elif [ "$COMMAND" = "lsSnapshottableDir" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.snapshot.LsSnapshottableDir

elif [ "$COMMAND" = "portmap" ] ; then

CLASS=org.apache.hadoop.portmap.Portmap

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_PORTMAP_OPTS"

elif [ "$COMMAND" = "nfs3" ] ; then

CLASS=org.apache.hadoop.hdfs.nfs.nfs3.Nfs3

HADOOP_OPTS="$HADOOP_OPTS $HADOOP_NFS3_OPTS"

elif [ "$COMMAND" = "cacheadmin" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.CacheAdmin

elif [ "$COMMAND" = "crypto" ] ; then

CLASS=org.apache.hadoop.hdfs.tools.CryptoAdmin

elif [ "$COMMAND" = "version" ] ; then

CLASS=org.apache.hadoop.util.VersionInfo

elif [ "$COMMAND" = "debug" ]; then

CLASS=org.apache.hadoop.hdfs.tools.DebugAdmin

elif [ "$COMMAND" = "classpath" ]; then

if [ "$#" -gt 0 ]; then

CLASS=org.apache.hadoop.util.Classpath

else

# No need to bother starting up a JVM for this simple case.

if $cygwin; then

CLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)

fi

echo $CLASSPATH

exit 0

fi

else

CLASS="$COMMAND"

fi

# cygwin path translation

if $cygwin; then

CLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)

HADOOP_LOG_DIR=$ImjxFzypZj(cygpath -w "$HADOOP_LOG_DIR" 2>/dev/null)

HADOOP_PREFIX=$(cygpath -w "$HADOOP_PREFIX" 2>/dev/null)

HADOOP_CONF_DIR=$(cygpath -w "$HADOOP_CONF_DIR" 2>/dev/null)

HADOOP_COMMON_HOME=$(cygpath -w "$HADOOP_COMMON_HOME" 2>/dev/null)

HADOOP_HDFS_HOME=$(cygpath -w "$HADOOP_HDFS_HOME" 2>/dev/null)

HADOOP_YARN_HOME=$(cygpath -w "$HADOOP_YARN_HOME" 2>/dev/null)

HADOOP_MAPRED_HOME=$(cygpath -w "$HADOOP_MAPRED_HOME" 2>/dev/null)

fi

export CLASSPATH=$CLASSPATH

HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"

# Check to see if we should start a secure datanode

if [ "$starting_secure_dn" = "true" ]; then

if [ "$HADOOP_PID_DIR" = "" ]; then

HADOOP_SECURE_DN_PID="/tmp/hadoop_secure_dn.pid"

else

HADOOP_SECURE_DN_PID="$HADOOP_PID_DIR/hadoop_secure_dn.pid"

fi

JSVC=$JSVC_HOME/jsvc

if [ ! -f $JSVC ]; then

echo "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run secure datanodes. "

echo "Please download and install jsvc from http://archive.apache.org/dist/commons/daemon/binaries/ "\

"and set JSVC_HOME to the directory containing the jsvc binary."

exit

fi

if [[ ! $JSVC_OUTFILE ]]; then

JSVC_OUTFILE="$HADOOP_LOG_DIR/jsvc.out"

fi

if [[ ! $JSVC_ERRFILE ]]; then

JSVC_ERRFILE="$HADOOP_LOG_DIR/jsvc.err"

fi

exec "$JSVC" \

-Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \

-errfile "$JSVC_ERRFILE" \

-pidfile "$HADOOP_SECURE_DN_PID" \

-nodetach \

-user "$HADOOP_SECURE_DN_USER" \

-cp "$CLASSPATH" \

$java_HEAP_MAX $HADOOP_OPTS \

org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@"

elif [ "$starting_privileged_nfs" = "true" ] ; then

if [ "$HADOOP_PID_DIR" = "" ]; then

HADOOP_PRIVILEGED_NFS_PID="/tmp/hadoop_privileged_nfs3.pid"

else

HADOOP_PRIVILEGED_NFS_PID="$HADOOP_PID_DIR/hadoop_privileged_nfs3.pid"

fi

JSVC=$JSVC_HOME/jsvc

if [ ! -f $JSVC ]; then

echo "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run privileged NFS gateways. "

echo "Please download and install jsvc from http://archive.apache.org/dist/commons/daemon/binaries/ "\

"and set JSVC_HOME to the directory containing the jsvc binary."

exit

fi

if [[ ! $JSVC_OUTFILE ]]; then

JSVC_OUTFILE="$HADOOP_LOG_DIR/nfs3_jsvc.out"

fi

if [[ ! $JSVC_ERRFILE ]]; then

JSVC_ERRFILE="$HADOOP_LOG_DIR/nfs3_jsvc.err"

fi

exec "$JSVC" \

-Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \

-errfile "$JSVC_ERRFILE" \

-pidfile "$HADOOP_PRIVILEGED_NFS_PID" \

-nodetach \

-user "$HADOOP_PRIVILEGED_NFS_USER" \

-cp "$CLASSPATH" \

$JAVA_HEAP_MAX $HADOOP_OPTS \

org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter "$@"

else

# run it

exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"

fi

这段代码的重点在第134行到219行和第304行。首先看第134行到219行,这段代码虽然很长但全是一堆if else语句。其逻辑也很简单,就是根据传入的COMMAND的值来为CLASS和HADOOP_OPTS赋值。然后是第304行,执行CLASS类。以第134行的namenode类为例,这里CLASS的值为org.apache.hadoop.hdfs.server.namenode.NameNode,这是一个java类随后便会执行这个类启动namenode。

如果想要debug hdfs的源码,最好在这里设置远程调试。因为这里有单个的服务类与启动参数,可以准确的定位需要的服务。

以上就是Hadoop源码分析之启动及脚本剖析的详细内容,更多关于Hadoop源码分析分析的资料请继续关注我们其它相关文章!

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:mybatis空值插入处理的解决方法
下一篇:Hadoop源码分析二安装配置过程详解
相关文章

 发表评论

暂时没有评论,来抢沙发吧~