The application of machine learning approaches to operations research problems absorbs many attentions of researchers, recently. Reinforcement learning is one of the main machine learning paradigms, in which the Markov Decision Process (MDP) is commonly used as a modelling framework. This study deals with a reconfigurable mixed-model assembly line where tasks can be dynamically assigned to stations at each takt, workers can move among stations at the end of each takt, and the order of entering product models is infinite and unknown. The equipment assignment to stations occurs at the line design stage, and equipment duplication is allowed. The dynamic task assignment and workers' movements among stations is an MDP that can be translated as a Linear Program (LP). As a result, the line design problem is formulated as a Mixed-Integer Linear Program (MILP) that integrates the MDP model. We propose some reduction rules and a decomposed transition algorithm to reduce the model. The new MILP models taking into account stochastic parameters are built to solve the stochastic and robust optimization problems, with the objectives of expected total cost minimization in all takts and total cost minimization in the worst takt, respectively. The implementation of algorithms and computational results with benchmark and generated instances demonstrate the performance of the proposed MDP models. The useful managerial insights and discussions are also provided.