Suppose we roll a die 20 times and are interested in the probability of seeing exactly two 5's, or we flip a coin 10 times and wonder how likely seeing exactly 6 heads might be, or we draw 7 cards (with replacement) from a deck and want to know how often we can expect to see an ace. At the heart of all of these examples is the notion of a binomial experiment:
There must be a fixed number of trials, which we typically denote by $n$. (e.g., number of dice rolls, coin flips, card draws with replacement, etc)
Each trial can have only two outcomes, which we frequently refer to as success and failure. (e.g., seeing a 5 vs. not seeing a 5, getting "heads" vs. getting "tails", drawing an ace vs. not drawing an ace, etc.)
The outcomes of each trial are independent of each other (e.g., knowledge of a previous die roll, or coin flip, or card draw with replacement does not affect the probabilities of doing the same a second time.)
The probability of success (and failure) remain the same for each trial. We denote the probability of success by $p$ and the probability of failure by $q$. (e.g., for rolling a 5 on a single roll of a die, $p=1/6, q=5/6$; for seeing "heads" on a coin flip, $p=1/2, q=1/2$; for drawing with replacement an ace from a full deck, $p=1/13, q=12/13$.)
If $X$ is a random variable that yields the number of successess seen in the trials of a binomial experiment, then we say that $X$ follows a binomial distribution.
We are, of course, interested in finding the probability that some particular number of successes is seen in the course of that binomial experiment. Reminding ourselves of the variables mentioned above that pin down the important characteristics of our experiment,
$$\begin{array}{l} n = \textrm{ the number of trials}\\ x = \textrm{ some number of successes, with } 0 \le x \le n\\ p = \textrm{ the probability of success on any one trial}\\ q = 1-p = \textrm{ the probability of failure on any one trial}\\ \end{array}$$we would then like to determine the probability of seeing exactly $x$ successes, $P(X=x)$, in terms of $x$ and these other variables $n$, $p$, and $q$.
Let us record the results (i.e., successes and failures for each trial) of a particular binomial experiment as a string of characters "$S$" and "$F$" in some order. So, for example, "$SSFSSFFFFF$" might be the result of a binomial experiment where the only successes seen in $n=10$ trials occurred in the first, second, fourth, and fifth trials.
Noting the independence of the outcomes of different trials, and recalling the multiplication rule for independent events, it is quickly shown that the probability of any single arrangement of $x$ successes and $n-x$ failures is $p^x q^{n-x}$.
Also note that it is impossible to see exactly $x$ successes in two different orders in the same trial. The related events are thus disjoint, and recalling the Addition Rule for disjoint events, we have $P(X=x) = K p^x q^{n-x}$ where $K$ is the number of ways in which we can see exactly $x$ successes in $n$ trials.
However, this value of $K$ is identical to the number of ways we can order $x$ letters $S$ and $(n-x)$ letters $F$. This, in turn, is equivalent to the number of ways we can choose $x$ positions of the possible $n$ positions to be $S$, making the rest $F$. This result should be familiar -- there are ${}_nC_x$ possible ways in which exactly $x$ successes can occur. Substituting this value in for $K$ above, we have $$P(X=x) = {}_nC_x p^x q^{n-x}$$
Note that this formula does indeed result in a valid PDF, as
${}_nC_x$, $p$, and $q$ are all greater than or equal to zero, so $P(X) \ge 0$, and
$\displaystyle{
\begin{array}{rcll}
\sum P(X) &=& \displaystyle{\sum_{x=0}^n {}_nC_xp^xq^{n-x}}\\
&=& (p+q)^n \quad &\textrm{by the Binomial Theorem}\\
&=& 1 \quad &\textrm{recalling that $q=1-p$}\\
\end{array}}$
Now that we have established a binomial distribution results in a valid PDF, we can investigate what the mean, variance, and standard deviation for this distribution might be. The following results take a good bit of algebra to verify, but the mean and standard deviation of a binomial distribution are given by $$\mu = np \quad \textrm{ and } \quad \sigma = \sqrt{npq}$$ and the variance, of course, is always the square of the standard deviation (so, $\sigma^2 = npq$).
While drawing with replacement (as suggested by the examples above involving a deck of cards) satisfies the independence criterion for a binomial experiment, drawing without replacement does not.
Consider the case where we draw 5 marbles from a bag that initially contains 10 red marbles and 10 black marbles. If our first marble drawn is red, the probability of drawing a red second marble is $9/19 \approx 0.473$. If our first marble drawn is black, the probability of drawing a red marble is $10/19 \approx 0.526$. Clearly, the color associated with the 1st and 2nd draws are not independent events.
However, if we had 10000 red marbles and 10000 black marbles in our bag initially, consider the probabilities involved. If the first marble is red, the probability the second is red is $9999/19999 \approx 0.49997$ and if the first marble is black, the probability the second is red is $10000/19999 \approx 0.50003$. Yes, these are different values -- but not by very much.
Consequently, when the sample size is small enough relative to the population, we can even treat drawing without replacement as independent. Any error introduced as a result is essentially negligible from a practical standpoint. As a standard, if the sample size is no more than 5% of the population, you may treat sampling without replacement as independent and still be fairly confident of your results.